Journal of the 


AMERICAN STATISTICA : 


ASSOCIATION — 








DECEMBER 1961 


fais coipates topo bees i ata ee 
Testing the Independence of Regression Disturbanees .........+.-H. Theil A, L, Nager 
Reotifying Inspection of Lots So ecerreresesenee SHSeeSs OSES gtecesee oF, J. Anscombe © 
ee Freres god Exyoenee Reshjonsed dye Se eee Population sighed shue as 
éaanRbinecs E. Tocuber, William Haenszel, and Monroe GC. Sirken «| 
i diicts Wactenheal ‘iegiscs > Gasca aa Colle vecses. NV. L, Johamen 
Statistical. Methods for the Mover-Stayer Model ....... FE: So ewan WS Nee dk Goodman 
Forgsnsting Industrial Production ...........csescsseesees SESE ee ee ee H. 0. Steller 
in Two-Way Analysis of Variance ..........0.000s scescdss dhs Mande 
Seca ot evennee beneven' 7 vo Rimery Chmmelennea en ere 
ed BA y= ag To eeeeees Entries of "Zero: Exrore in @eeserete se osvee Agnes ak | 
Population Censuses aadenebasvbes sees dh sein thes M. A. BLBads 
The Statistical Analysis of Industry Structure An Application 10 Food Industries .....5.5. 
Reapdeborscoesteses BCeseeeeeosessesesenee Sh to 8 “Dabs! Priasee cad Rest Bal 
Note eee eee Ramlomised Block Design . ev bceccnses 
“$6 2 tas .sseeee ohm Leroy Folks and Dal Lon Went 
Genin Diasibtion in Acceptance Sampling Haned on Lif Te. OE Re OR 
Shanti S. and Phyllis 4. Grell 


A Bivariate Exteusion of the Exponential Distcibation at ceed acme rer el Pr 
eee conan totus eee Ra SE ae oe a , 
‘Sherr ae re nin i 


. Stpatrensereeores esos cosa 


A Nomograph pe eh ig cae Correlation SELeeKeeenereresceeeenee nee 
Beeeetosoven ee a v. Lees:and Frederic M. Lord. 


Stepwise Least Squares: Residual Anslyeis and Speciicetion Brver .....Arthar S. Goldberger a 


; ec 


as | a ; y 8 8 puett | 


NOTES ABOUT AUTHORS eeeeceeseeseseos 4a nehe<<ghsd shen ee's eect eoeeas Besoveece peede- a > ah bee ma 
CORRIGENDA Cover oseweseeeecceos CO e ee eee eS ee see sdcesesesececcoosoesesenesecesee © 4 i iY 


“rene SST re 


Maiz, E. Scott 


Walker, Sidney G. MOORES CC oes eedereses ek eee oe 006. F ; 


PUBLICATIONS RECEIVED PTEererieritiy ys? ft Ty spe coseetees sui Meet anaes yea 
ae 293-296 ...cscensises ssondaasinessenege opnaboeedl ‘dl 





|} VOLUME 56 


aah 


“6: Gi a 

* Seep ey 
85 te aba 6 
ae eu 

eee, Pee ae 
2 ee CaN) ae laey 2 ae 


_ >s. 4 - y 
ee ? pe ae = 

F 8S tet) eee oe 
ey. Baer ee a Ns AN ca ae 
os eee aca pte es Mer 





American Statistical Association 
1757 K St., N.W., Washington 6, D.C. 
Organized November #7, 1889, Incorporated 1841 


The American Statistical Association is a scientific and educational organi- 
sation. Its membership is not confined to professional statisticians but.includes 
economists, business executives, research directors, government officials, uni- 
versity professors, and others who are seriously interested in the application of 
statistical methods to practical problems, in the development of more useful 
methods, and in the improvement of basic statistical data. Engineers, mathe- 
maticians, biologists, actuaries, sociologists, psychologists, and representatives of 
many other professions are included in the membership of the Association. 


AnnvaL Douzs® 


Student membership 
Institutional membership, mini- 


* Includes subscription to the Journal of the American Statistical Association and to The Ameri- 
can Statistician. 


Subscription rate: $8 per year. Prices for back issues available on request. 
For further information about the Association and ar. 
forms, write the Secretary, 1757 K St., N.W., Washington 6, D. 


The present Institutional Members are: 


. 4 hone & 
elegraph Company 
> prwe oh - Company 

Cork Company 
og rust Company 
Bell Telephone Laboratories 
California Texas Oil Corporation 
Chrysler Corporation 


Deere & Com) 

Dun = Bradetret, bese 

The Equitable ble Lite ee 
Society of the U. 8. 

Ford Motor Company 

General Dynamics Corporation 

General Motors me Cie 

George Banta Co., Inc 

Humble Oil and Refining Co 

beceteer bmg ~ saa 9 of — 
sters, auffers, Warehousemen 
Helpers of America 

International Business 


Machine Co 
International Pair venies Compary 





JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


The Editors welcome the submission of manuscripts for possible publication. They 
should be typewritten entirely double-spaced, including footnotes, and two copies should 
be sent to the Editor, Clifford Hildreth, Wells Hall, Section A, Michigan State University, 
East Lansing, Michigan. Books for review should be sent to the same address. Unsolicited 
book reviews are not accepted, but suggestions of titles for review are welcome. 


EDITOR 
Currrorp Hitpretu, Michigan State University 


ASSISTANTS TO THE EDITOR 
PHYLLIS QuINN 
BARBARA BOEHLKE 


ASSOCIATE EDITORS 


JEROME CORNFIELD Marc NERLOVE 
National Institutes of Health Stanford University 


Rosert FERBER INGRAM OLKIN 
University of Illinois Stanford University 

Wituiam G. Mapow Joun W. Pratt 
Stanford Research Institute Harvard University 


ROSEDITH SITGREAVES 
Columbia University 


ADVISORY PANEL OF FORMER EDITORS 


Wituram G. Cocuran (1945-50) FreperickK F., SrepHan (1935-40) 
Harvard University Princeton University 


Frank A. Ross (1926-34, 41-5) W. ALLEN Watts (1950-60) 
Thetford, Vermont University of Chicago 


Corrigenda: Readers and authors are urged to submit to the Editor notices of 
errors found in this or any previous volume. These will be published once a year, 
in the December issue. 


Published Quarterly by the AMERICAN STATISTICAL ASSOCIATION 
Publication Office: Curtis Reed Plaza, Menasha, Wisconsin. Editorial Office: Wells Hall, Section A, Michigan State 
University, East Lansing, Michigan. Business Office: 1757 K 8t., N.W., Washington 6, D, C. Acceptance for mailing 
at special rate of pos provided for in the Act of February 28, 1925, embodied = paragraph 4, section 588, 
P. L. & R., authorized March 25, 1936. Second class postage paid at Menasha, Wisconsi 
Anyone wishing to change his mailing address should allow eight weeks notice. A <a of the address taken from 
an issue of the periodical shouid accompany the change of address request. 











By Samuel S. Wilks, Princeton University. This book offers a systematic 
and unified mathematical exposition of major results and topics ia mathe- 
matical statistics, giving particular emphasis to the important developments 
of the past quarter-century. The organization and presentation of material has 
been developed and tested over a twenty-year period in courses given by the 
author at Princeton University—with the result that the book affords a com- 
bination of broad coverage and coherent unity that is wholly unusual. Designed 
for readers with a good background in mathematics, the book covers the basic 
mathematical statistics underlying the theory of sampling, statistical estima- 
tion, hypothesis testing, non-parametric statistical inference, sequential analysis, 
time series analysis, statistical decision functions, and multivariate statistical 
analysis. There are over 400 problems appended to the various chapters. 1962. 
Approx. 656 pages. Prob. $17.50.* 


MANAGEMENT MODELS 
AND INDUSTRIAL APPLICATIONS 
OF LINEAR PROGRAMMING, Volumes | & Il 


By Abraham Charnes, Northwestern University; and William W. Cooper, 
Carnegie Institute of Technology. These volumes illustrate all aspects of the 
underlying theory of linear programming with concrete numerical examples 
accompanied by explanations which 1) carefully clarify the theories and 
examples, and 2) suggest further possible applications. Volume I: 1961. 467 
pages. $11.75.* Volume II: 1961. Approx. 448 pages. $11.75* 


RECENT METHUEN 
MONOGRAPHS IN STATISTICS ... 
ANALYSING QUALITATIVE DATA 


By A. E. Maxwell, University of London. 1961. 163 pages. $3.00. 


STOCHASTIC POPULATION MODELS 
IN ECOLOGY AND EPIDEMIOLOGY 


By M. S. Bartlett, University of London. 1960. 90 pages. $2.00. 


By D. R. Cox, University of London; and W.L. Smith, University of North 
Carolina. 1961. In Press. 


FOUNDATIONS OF STATISTICAL INFERENCE 


Edited by George Barnard, University of London. 1961. In Press. 


* Also available in a textbook edition for college adoption. 


JOHN WILEY & SONS, INC. 
440 Park Avenue South, New York 16, N.Y. 


Please mention the Journal of the American Statistica, AssociaTion in writing advertisers 





JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


Vo.LuME 56 DEcEMBER 1961 NuMBER 296 








ARTICLES 


Occupational Components of Educational Differences in Income. 
: . Oris DupLey Duncan 


Testing the Independence of Regression Disturbances 
. . « « J, Teme amp A. is “NAGAR 


Rectifying Inspection of Lots . 7. . . . F. J. ANSCOMBE 


Residence Histories and Exposure Residences for the United States —— 
Karu E. TAzuBER, WILLIAM HAENSZEL, AND MonROE G. SIRKEN 


A Simple Theoretical Approach to Cumulative Sum Control — 
vey _ L. Jounson 


Statistical Methods for the Mover-Stayer Model fees? Wore ae A. GoopMAN 
Forecasting Industrial Production . . - « « « Ei-@Or Geman 
Non-Additivity in Two-Way Analysis of Vasiance. . . « « JoHN MANDEL 


On Comparing Intensities of Association between Two Binary Characteristics in 
Two Different Populations . . . . . AGNES BERGER 


Failure of Enumerators To Make Entries of Zero: Ei rrors in Recording Childless 
Cases in Population Censuses .. . « « «Sli A, BeeBaser 


The Statistical Analysis of Industry Structure: = Application to Food Industries 
. Les E. Preston anp Earu J. BELL 


Note on ‘the Missing Plot Proc edure i in a Randomized Block Design 
“a . Joun Leroy Fouxs AND Deu Lon West 


Gamma Distribution i in Ac ates Sampling Based on Life Tests . 
Suantr S. Gupta AND Puyiurs A. Grou 


A Bivariate ‘Extension a the E xponential Distribution . . .JoHNn E. FrEuND 
On the Resolution of Statistical Hypotheses . . . Rosert V. Hoge 
The Asymptotic Variances of Method of Moments Estimates of the Parameters of 
the Truncated Binomial and Negative Binomial Distributions . 8. M. SHau 

A a for ndries Partial Correlation Coefficients 
‘ . Rurs W. Legs anv Freperic M. Lorp 


Stepwise Le: ast Squ: ares: Residual Analy sis and Specification Error. . 
a ee See Artuur S. GOLDBERGER 


NOTES ABOUT AUTHORS 

CORRIGENDA . . 

PUBLICATIONS RE CEIVED . ans : 
INDEX TO VOLUME 56, 1961, Nos. 293- 296 3 


BOOK REVIEWS 
NATIONAL Bureau or Economic REesgarcu, Demographic and Economic Change 
in Developed Countries . . . ALEXANDER GERSCHENKRON 
AxerMaN, Jonan, Theory of Industriailon—Causel ae and Economic Plans. 
AMARTYA KuMAR SEN 
SALTER, W. E. G., Productivity ond Technical Change . . «. GuEnn L. Jonnson 
Duncan, Oris Dupiay, Cuzzort, Ray P., anp DuNcAN, BEVERLY, Statistical Geog- 
raphy: Problems in Analyzing Areal — x. . Lesire Kisx 
CuurcHMAN, C. West, Prediction and Optimal Decision . 

. . Nicnoxas M. Surru 

Banr.err, M. S., Stochastic Population Models in Ecology ond Epidemiology 
Norman T. J. BarLey 
Hauser, Paiuir M. AND > Duncan, Oris Dupuzy, Eprrons, The Study of Population: 
AnInventory and Appraisal. . .... . .  GeEorGE F. Mair 





FRIEND, Inw1N AND JonEs, Rospert, Epitors, Proceedings of the couprone on Con- 
sumption and Saving, Volumes I and II ‘ coTr MAYNES 

QvuacKENBusH, G. G. AND SuarFer, J. D. , Collecting Food Purchase Data by Con- 
sumer Panel . . be tt pe fel «) Wane 

Hatmos, Paut R., Naive Set of Theory . .Srpney G. WINTER, Jr. 

Nippitcu, P. H., Blementary Logic of Science and Mathematics . STEVEN OREY 

Nippircu, P. H., Introductory Formal Logic of Mathematics . . .STeveN OrEY 

TaxAcs, Lasos, Stochastic Processes: Problems and Solutions . .Leo Karz 

ALEXANDER, Howarp W., Elements of Mathematical Statistics IsapoRE BLUMEN 

Lovepay, R., A Second Course in Statistics. . . . . GERALD J. LIEBERMAN 

Kozge.ka, R. M., Elements of Statistical Inference. . . Perer ZEHNA 

H1iTroTsuBASHI Untv ERSITY, Annotated Economic Statistics iden infor Postwar Years 
Upto1968. . . . . M. BRONFENBRENNER 

HEGELAND, Hugo, Epiror, Money, Grou th, ond Me thedalegy: In Honor of Johan 
y. kerman 

NATIONAL BUREAU OF Ecc ONOMIC  REsEance H, Towards a Firmer Basis of Economie 
Policy 

UnitTED NATIONS, De PARTMENT OF v Economtc AND Soctat Avrarns, Yearbook of Na a- 
tional Accounts Statistics . ‘ ey 

RosENTHAL, ARNOLD J., EpiroR-1N- C HIEF, Quality Control Yearkesh, 1960 

TueiL, H., Contributions to Economic Analysis : Economic Forecasts and Policy. Sec- 
ond Revised Edition 

NATIONAL EpucaTION ASSOCIATION OF THE U: NITED States, , Teacher Susely. ond 
Demand in the Public Schools, 1961. ee 

Survey Researcu CENTER, UNIVERSITY OF Micnican, Boonente Saws Data 


THORNDIKE, Ropert L. anv HaGen, Evizapetu, Measurement and Evaluation in 
Psyc hology and Education ee eS ee eee 


Guest, P. G., Numerical Methods of Cures Fitting . 





1961 OFFICERS, AMERICAN STATISTICAL ASSOCIATION 


President 


Martin R. GaINsBRUGH 


(1959-61) 
Guy H. Orcurr 


Board of Directors 


President-Elect 
Puitie M. Hauser 


Vice-Presidents 


(1960-62) 
GeorceE E. P. Box 


Past President 
Morris H. HaNnsEN 


(1961-63) 
Grorrrey H. Moore 


Directors 
A. H. Bowxer (1959-61) 


RarmonpD T. Bowman (1959-61) 
Harowp F. Dorn (1960-62) 


FREDERICK MostTELLER (1960-62) 
Dove.ias G. CuHapMan (1961-63) 
Dorotruy M. GiiForp (1961-63) 


Secretary-Treasurer & Executive Director 
Donatp C. RILgy 


Members of the Council 
Joun I. GriFFIN 
James E. Grizzie 
Morris HAMBURG 


Paut D. MinToN 
NatHAaN Morrison 
Gortrriep E. NorrHER 
Louis J. Parapiso 
SamvueEt B. RicumMonp 
Ernest Rvusin 
Harry SHARP 
Wituram H. SHaw 
Rosert B. Spears 
ConraD TAEUBER 
Max S. WEINSTEIN 
ALBERT T. SomMMERS 


Grorrrey BEALL 
Rosert J. BUEHLER 
J. Parker Bursk 
Herpert A. Davip 
Artuur M. Durtron 
CuuRCHILL EISENHART 
E. J. Enequist, JR. 
W. T. FepERER 

S. M. Frees, Jr. 

Joun E. Freunp 
Ricuarp A. FrEUND 
Maurice I. GprsHENSON 


Currorp G. HILDRETH 
Rosert G. Hooke 

J. Sruarr HunTEeR 

D. V. HuNTSBERGER 
R. E. JoHNsON 
NatHAN Keyritz 
Crype Y. Kramer 
GeraLp J. LIEBERMAN 
Arraur §. LiITrE.LL 
Hyman MENDUKE 


Section Chairmen for 1961 


Artuur M. Dotron, Biometrics Section 

Louis J. Parapiso, Business & Economic Statistics Section 
G. J. LizBERMAN, Section on Physical & Engineering Sciences 
NatHan Keyrirtz, Social Statistics Section 

J. Parker Bursk, Section on Training 


Office Manager: Epaar M, BiserEr 





Tables and other aids to computation appearing in this Journal are abstracted and 
indexed in Mathematical Tables and Other Aids to Computation. 








EDITORIAL COLLABORATORS 


Irma ADELMAN, Stanford University 

WituiaM R. ALLEN, Princeton, New Jersey 

R. L. ANDERSON, North Carolina State Col- 
lege 

Frep C, ANDREWS, University of Oregon 

FrANK J. ANScoMBE, Princeton University 

KENNETH J. Arrow, Stanford University 

Ronatp H. Beartis, California Bureau of 
Criminal Statistics 

Gary 8S. Becker, National Bureau of Eco- 
nomic Research 

C. B. Brett, San Diego State College 

A. T. Baarucua-Rerp, University of Oregon 

Donatp J. Bocug, University of Chicago 

J. C. G. Boor, Netherlands School of Eco- 
nomics 

Rap A. BraD.ey, Florida State University 

Epwarp C. Bupp, Yale University 

Rosert J. Buen ier, Jowa State University 

HERMAN CHERNOF?, Sianford University 

Grecory C, Cuow, Cornell University 

Car F. Curist, Johns Hopkins University 

Paut C,. Currrorp, Montclair State College 

A. Currrorp CoHEN, JrR., University of 
Georgia 

KautMAN J. Conen, Carnegie Institute of 
Tech nology 

W.5S. Connor, Research Triangle Institute 

DuptEY J. CowpvENn, University of North 
Carolina 

F. N. Davin, University College, London, 
England 

Herpert T. Davin, Iowa State University 

Rosert T. Davis, Stanford University 

Catvert L. Deprick, United States Bureau 
of the Census 

W. Epwarps Demina, New York University 

W. J. Drxon, University of California Med- 
ical Center, Los Angeles 

Haroup F. Dorn, National Heart Institute 

NorMan R. Draper, University of Wiscon- 
sin 

AcHEson J. DuncAN, Johns Hopkins Uni- 
versity 

Davip B. Duncan, Johns Hopkins Univer- 
sily 

Meyer Dwass, Northwestern University 

Ricwarp A. Eastern, Stanford University 

A. R. Ecxusr, Bell Telephone Laboratories, 
Ine. 

HARRY 
York 

Rosert B. Ferrer, Yale University 

JOHN Firestone, City College of New York 

BENJAMIN FRANK, United States Depart- 
ment of Justice 

D. A. 8. Fraser, University of Toronto 

Joun E. Freunp, Arizona State University 

Joun J. Gart, Johns Hopkins University 

Supuatsu Guurye, Northwestern University 

R. GNANADESIKAN, Bell Telephone Labora- 
tories, Inc. 

ARTHUR GOLDBERGER, University of Wis- 
consin 

Seva F. Goutpsmiru, United Siates Bureau 
of the Census 


E1sENPREsS, White Plains, New 


Leo A. GoopMaNn, University of Chicago 

FRANKLIN A. GRAyBILL, Colorado State 
University 

SAMUEL GREENHOUSE, National Institute of 
Mental Health 

J. AkTHUR GREENWOOD, University of Min- 
nesota 

E. J. GuMBEL, Columbia University 

JoHN GuRLAND, University of Wisconsin 

Rospert L. Gustrarson, Michigan State 
University 

Davin Hautey, Acadia University, Nova 
Scotia 

W. J. Hawi, University of North Carolina 

Max Ha.perin, General Electric Company 

C. Horace Hamiuton, North Carolina State 
College 

H. Leon Harter, 
Force Base 

H. O. Hartiey, Jowa State University 

Fritz Herzoa, Michigan State University 

Water E. Hoap.ey, Armstrong Cork Com- 
pany 

Wassity Horrrpina, University of North 
Carolina 

Rosert V. Hoaa, University of Iowa 

Srpney A. Jarre, United States Department 
of Labor 

M. VERNON Jouns, Jr., Stanford University 

EuGENE A. Jounson, University of Minne- 
sola 

Howarp L. Jongs, University of Chicago 

Harowp A. Kaun, National Heart Institute 

Hyman B. Karrz, United States Depariment 
of Labor 

Oscar KempTHorne, Jowa State University 

BrapDForpD F. KimBa.u, Public Service Com- 
mission 

Eve tyn M. Kiracawa, University of Chi- 


Wright-Patterson Air 


cago 

James W. KNow .gs, Joint Economic Com- 
mittee, Congress of United States 

Wituram H. Kruskat, University of Chi- 
cago 

Roy R. Kursusr, Jr., University of North 
Carolina 

Ropert J. LAMPpMAN, University of Wiscon- 
sin 

STanLeEy Lesercortt, United States Bureau 
of the Budget 

—s Lev, University of the State of New 
“ork 

GERALD LIEBERMAN, Stanford University 

Ta-Cuuna Liv, Cornell University 

Frepertc M. Lorp, Educational Testing 
Service 

Evucene Luxacs, Catholic University of 
America 

NatTHAN MANnrTsEL, National Cancer Insti- 
tute 

Exv1 Marks, National Analysts, Inc. 

ALBERT W. MarsHat.u, Princeton Univer- 
sity 

J. G. Mautpon, University of California, 
Berkeley 

Puiuipe J. McCarruy, Cornell University 


(Continued on page 1033) 





JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


Number 296 DECEMBER, 1961 Volume 56 


OCCUPATIONAL COMPONEN'TS OF EDUCATIONAL 
DIFFERENCES IN INCOME* 


Ot1s DupLEY DuNcAN 
University of Chicago 


The relationship between income and education, with occupation 
held constant, can be ascertained indirectly from incompletely cross- 
tabulated data, making use of formulas involved in the analysis of 
covariance. The between-occupation component of the difference in 
income level between categories of educational attainment is substan- 
tial. Occupaticn, therefore, is an important intervening variable in the 
translation of educational advantage into income advantage. The 
method may be useful in a variety of problems, familiar to demogra- 
phers and social statisticians, whose analysis must be approached in a 
roundabout manner owing to the limited detail of published tabulations. 


T= salient functional connection among the three basic aspects of socio- 


economic status—education, occupation, and income—is simply stated: 
education qualifies the individual for participation in occupational life, and pur- 
suit of an occupation yields him a return in the form of income. Occupation 
is, thus, the intervening activity linking education and income. Census data 
have been used to ascertain the regression of income on education [4, 8] and 
to investigate in some detail the relevance of education for occupational selec- 
tion [3]. Such studies, however, have failed to complete the analysis with a 
systematic treatment of occupation as the intervening variable, because the 
decennial census reports have not provided a simultaneous cross-tabulation of 
income by occupation by education. Only recently has such a cross-tabulation, 
for a selected age group and based on a comparatively small sample, been 
issued by the Bureau of the Census. (See source note to Table 5.) The primary 
purpose of this paper is to illustrate an indirect method of making limited 
inferences for such a problem where the full cross-tabulation is not available. 
The technique is submitted as an alternative to that of indirect standardization 
or “expected cases” ordinarily employed by demographers in situations of this 
kind. While the emphasis of the discussion is on the analytical technique, the 
substantive results include some, though by no means all, of the information 
that fully cross-tabulated data would reveal. These substantive results, though 
perhaps not surprising, are nonetheless important: differences in income ac- 
cording to level of educational attainment are due, in large measure, to the 
fact that well-educated persons engage disproportionately in high-income occu- 





* An earlier version of this paper was presented at the 1961 annual meeting of the Population Association of 
America. The paper is a by-product of research conducted under project RG-5667, “Occupational Classification for 
Vital Statistics Use,” supported by a grant from the U. 8. Public Health Service. The clerical assistance of Nathaniel 
Hare is gratefully acknowledged. 


783 





784 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


pations and poorly-educated persons in low-income occupations. The hypothe- 
sis that occupation is a major intervening factor is sustained by the analysis, 
and we thus secure an important part of the explanation of the results of 
previous research. 

The paper’s methodological contribution consists of a more formal and com- 
plete exposition than is elsewhere available of techniques whose major elements 
are well known. In a conventional analysis-of-covariance setup, a dependent 
variable Y is regressed on an independent variable X within categories (groups) 
of an attribute Z. Using the C-notation of Walker and Lev ({10], ch. 15) for the 
relevant sums of squares and products, three regression coefficients and a 
correlation ratio are defined as follows: total regression, Br=C.yr/Czzr; 
average within-group regression, Bye=Cryw/Crw0; between-group regression, 
Bo = C2p/Cz2; correlation ratio, independent variable on groups, nxz= Cz/Cz2r. 
(Greek 8 rather than 6 is used for the regression coefficients to suggest that the 
exposition is in terms of population parameters, not sample estimates thereof.) 
The important point for the problem at hand—one seldom mentioned in sta- 
tistics texts—is that it is not necessary to have the complete three-way classifi- 
cation of X, Y, and Z in order to obtain some of the information yielded by 
an analysis of covariance (just as it is not necessary to have a three-way corre- 
lation table to obtain partial regression coefficients in multiple regression 
analysis). In particular, although 8, may be calculated directly from the within- 
group sums of squares and products, it may equally well be ascertained 
indirectly from the readily verified formula, 


By. = (Br — Bunxz)/( - nx2); 


without making use of the within-group data. 

A modification of this approach was made for the present problem. While 
income (Y) and education (X) may be thought of as variables, the calculation 
of, say, Br from the available data is subject to certain difficulties. Class inter- 
vals, some of them open-ended, are of unequal sizes and have uncertain aver- 
ages. Moreover, the regression of dollar income on number of years of schooling 
may be nonlinear, as suggested by Miller [8, p. 965]: “... a year spent in com- 
pleting a given level of schooling (e.g., the fourth year in high school) yields a 
greater return than any of the years leading up to graduation.” While it is 
somewhat artificial to treat income and education as dichotomies, as is done 
below, in many problems where an indirect approach is required, attribute 
data are more likely to be encountered than are true variables. 

Reports of the 1950 census cited subsequently provide a tabulation of years 
of school completed by major occupation group, a tabulation of total money 
income in 1949 by major occupation group, and a tabulation of income by 
number of school years completed. The approach here is to treat both income 
and educational attainment as dichotomous variables and major occupation 
groups as the several sub-populations of a conventional analysis-of-covariance 
setup with one criterion of classification. Table 1 indicates the notation for the 
four-fold table from which one may compute the coefficient of total regression, 
Br, of income on education. If the high value of each variable is scored unity 





EDUCATIONAL DIFFERENCES IN INCOME 785 


TABLE 1. INCOME IN 1949 BY NUMBER OF SCHOOL YEARS COM- 
PLETED, FOR MALES 14 YEARS OLD AND OVER: 1950 


{Numbers in millions] 








Education (X) 





Income (Y) Under 4 years High school 


of high schoolt 4 years, or more 





9.1 37.9 
8.0 (D) 16.7 (B) 





Under $3 ,000* 28.8 
$3 ,000 or more 8.7 





Total 37.5 17.1 (A) 54.6 (N) 








* Includes persons without income and persons for whom income was not reported. 

t Includes persons with school years not reported. 

Source: U. S. Bureau of the Census, 1950 Census of Population, Vol. IV. Special Reports, Part 5, Chapter B, 
“Education” (Washington: Government Printing Office, 1953), Table 12. 


and the low value zero, this coefficient reduces to the difference between pro- 
portions, D/A—(B-D)/(N-A). In this example, the proportion of high 
school graduates with incomes of $3,000 or more is .472, as compared with .231 
for persons who completed less than four years of high school; the difference, 
.241, is the value of 87. The reader may visualize a similar four-fold table per- 
taining to each of the major occupation groups, of which there are 13, regarding 
(as one must, to obtain arithmetical closure) “occupation not reported” and 
“not in experienced civilian labor force” as quasi-occupation groups for this 
analysis. (The latter category is subjected to some minor adjustments of fre- 
quencies to effect a reconciliation of slight discrepancies between the tabula- 
tions in the two census reports whence the data come.) We do not, however, 
have the internal cells of these tables, but only the marginal and grand totals 
(shown in Table 2), symbolized for the ith occupation group by aj, b;, and nj, 
for the categories denoted by corresponding capital letters in Table 1. By 
definition, 


> a; = A, >+b:=B, and Yon=N. 


Now, although one cannot compute the regression of income on education 
within each occupation group, for lack of the requisite cross-tabulations, one 
can compute the average within-group regression, defined for dichotomous 
variables, in the foregoing notation, as 


Be = (D — DL asbi/n)/(A — Dai/nd). 


This may be seen to be rather analogous in form to the total regression, whose 
definition, given above, may be rewritten. Br=(D—AB/N)/(A—A?/N). 
Because of this analogy and since the total regression takes the form of a dif- 
ference between proportions, the average within-group regression may be 
termed the “partial difference” between the same proportions. It is a pooling, 
in a special sense, of the several within-group differences (which can be com- 





786 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


TABLE 2. INCOME IN 1949 AND EDUCATIONAL ATTAINMENT, BY MAJOR 
OCCUPATION GROUP, FOR MALES 14 YEARS OLD AND OVER: 1950 
{Numbers in millions] 




















With incomes | With 4 years 
Total | of $3,000 | high school, 
Major occupation group males in nema in iene 

(ms) (bi) (ai) 
Total males, 14 and over 54.6 16.7 17.1 
Professional, technical and kindred workers 3.0 1.9 2.5 
Farmers and farm managers 4.2 0.8 0.7 
Managers, officials and proprietors, except farm 4.3 2.7 2.3 
Clerical and kindred workers 2.7 1.3 1.6 
Sales workers 2.6 1.2 1.5 
Craftsmen, foremen and kindred workers 7.8 4.0 2.3 
Operatives and kindred workers 8.5 3.0 1.9 
Private household workers 0.08 0.004 0.01 
Service workers, except private household 2.5 0.6 0.6 
Farm laborers and foremen 2.1 0.07 0.3 
Laborers, except farm and mine 3.6 0.6 0.5 
Occupation not reported 0.8 0.1 0.2 
Not in experienced civilian labor force 12.5 0.6 2.7 





Source: U. 8S. Bureau of the Census, 1950 Census of Population, Vol. IV, Special Reports, Part 5, Chapter B, 
“Education,” Table 12, and Part 1, Chapter B, “Occupational Characteristics,” Tables 10 and 19 (Washington: 
Government Printing Office, 1953 and 1956). 


puted only from cross-tabulated data). Like any average, it may be more or 
less representative of the quantities that it summarizes. In particular, from the 
value of 8» one cannot necessarily infer the values of the individual within- 
group regressions (differences), nor can one even assume that all are of the same 
sign. In many situations, nevertheless, it is meaningful to have a “partial” 
that is merely some kind of average of individual “partials’—as is true in 
conventional multiple regression analysis, for example. Ascertaining the regres- 
sion of income on education with occupation “partialled out” in the manner 
described gives us one piece of useful information on occupation as the inter- 
vening variable. 

In addition to the difference (total regression) and partial difference (average 
within-group regression) ,the data available permit us to compute the between- 
group regression coefficient, defined as 


2 a,yb;/n; - AB/N 


ie « ai/n; — A*/N 





By = 


The between-group regression relates the proportion in each occupation group 
with a specified income (b;/n,) to the proportion with a specified level of edu- 
cational attainment (a,/n,). 

In analogous problems where the “groups” are areal units, 6, has been 











EDUCATIONAL DIFFERENCES IN INCOME 787 


termed the “ecological regression” by Goodman [5], [6]. A review and an am- 
plification of some points on “ecological regression” treated in Goodman’s two 
papers and an earlier one by Robinson [9], which introduced the term “ecologi- 
cal correlation,” are given by Duncan, Cuzzort, and Duncan [2, Ch. 3]. The 
writer, a human ecologist, is made somewhat nervous by these terms, since 
“ecological correlation” neither has any intrinsic connection with ecology nor is 
very informative to social statisticians who may happen to be unfamiliar with 
the somewhat specialized literature in which this terminology appears. In any 
event, it is appropriate to acknowledge that ideas for this paper derive from 
a study of the “ecological correlation” problem, although the “ecological” 
analog to the technique suggested here appears to have been overlooked 
hitherto. 

The three regression coefficients are interrelated by the formula, 
Br=Bw(1—nxz)+Senxz in which xz is the square of the correlation ratio of 
education (X) on occupation group (Z). In the present notation, 


axz = (oy as/ns— A'/N)/(A — A’/N). 


As a descriptive measure in itself, nz indicates the degree to which occupations 
are selective of educational levels. The formula involving the three regression 
coefficients and the squared correlation ratio provides a decomposition of Br, 
the total difference, into two additive components. The first component may 
be seen to be the product of two “within-group” quantities, the average within- 
group regression of income on education (8,,.), and the proportion of variation in 
education attributable to variation within groups (1—nxz). The second com- 
ponent is the product of the two “between-group” quantities, the between- 
group regression of income on education (8,) and the proportion of the varia- 
tion in education attributable to educational differences between occupation 
groups (nxz). Unlike some systems for decomposing a total difference (e.g., 
Kitagawa [7]), this decomposition does not require knowledge of the several 
within-group differences, but only of their “average” in the special sense of this 
term mentioned earlier. So far as the writer knows, formulas for the sampling 
errors of the two components are not immediately available. In the case at 
hand, as in other census-type analyses, the data come from very large samples, 
so that the problem of sampling errors does not seem urgent. 

It should be clear that the “between-occupation component” does not pur- 
port to represent all types of “occupational effects” (a term suggested by a 
reviewer of this paper). If, for example, nyz=0, then 87=8, and the between- 
occupation component is nil. It may still be true, however, that the individual 
within-occupation regressions (8,) differ among themselves and, in particular, 
that 6;~8. for at least some 7. One might wish to think of a significant B;— 6, 
as, in some sense, an “occupational effect.” It is evidently one that cannot be 
brought to light by the indirect methods described here. 

Table 3 summarizes the numerical results of the computations described. It 
also presents for comparison the results obtained when detailed occupations, 
rather than major occupation groups, are employed. The 446 detailed occupa- 














788 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


tions of the Bureau of the Census were consolidated into 20 intervals of the 
recently developed occupational socio-economic index, Duncan [1]. In effect, 
therefore, the analysis pertains to a 21-group classification (“not in experienced 
civilian labor force” is treated as a separate category) of occupations, the 
categories of which are appreciably more homogeneous in income and educa- 
tional attainment than are major occupation groups. 

In both sets of results the partial difference is substantially less than the 
total difference, i.e., on the average, income differences by education are con- 
siderably less within occupation groups than for the labor force as a whole. 


TABLE 3. MAJOR OCCUPATION GROUP AND DETAILED OCCUPATION 
COMPONENTS OF EDUCATIONAL DIFFERENCE IN INCOME, 
FOR MALES 14 YEARS OLD AND OVER: 1950 














Major occu- Detailed 
Component pation group occupation 
controlled controlled 
Proportion with income of $3,000 or more: 
High school graduates .472 
Less than 4 years high school .231 
Difference .241 
Partial difference 122 .078 
Between-group regression .839 .994 
Squared correlation ratio 165 177 
“Within” component .102 .064 
“Between” component .138 .176 











Source: See Table 2. 


The size of the “between” component implies that about 57 per cent of the gross 
educational difference in income is accounted for by major occupation groups, 
and about 73 per cent by detailed occupation. Certainly these data suggest 
that an educational advantage is translated into an income advantage pri- 
marily, though not exclusively, by pursuing an occupation in which the pre- 
vailing income level is comparatively high. 

Somewhat different results are obtained, of course, if the education and in- 
come distributions are broken at points other than those selected for the 
foregoing discussion. Such differences are illustrated in Table 4. If the cutting 
point is toward the low (high) end of the education distribution, the between- 
occupation component is larger, relative to the total difference, at the high 
(low) end of the income distribution. This component, however, is substantial 
in every comparison. 

The most serious substantive limitation on these results is, no doubt, the 
lack of an age control, since both income and education as well as occupation 
vary with age in a complex fashion. The findings, moreover, would perhaps be 
a little easier to interpret if the data pertained solely to members of the labor 
force, since the relationship between income and education is different for 














EDUCATIONAL DIFFERENCES IN INCOME 789 


persons not in the labor force than for any of the occupation groups. In the 
final set of results to be mentioned, both these deficiencies are remedied. The 
data pertain to a national sample of men 35 to 54 years of age who were year- 
round full-time workers in 1956. (See source note to Table 5.) 

This same body of data affords an opportunity to illustrate one additional 
facet of the indirect approach developed in this paper, i.e., the use of the 
partial difference (average within-group regression) as an estimate of the indi- 


TABLE 4. BETWEEN-OCCUPATION COMPONENT OF EDUCATIONAL DIF- 
FERENCES IN INCOME, AT SELECTED CUTTING POINTS OF 
THE EDUCATION AND INCOME DISTRIBUTIONS, FOR 
MALES 14 YEARS OLD AND OVER: 1950 











“Between” component. 





Cutting point in education Total 
and income distributions difference Major occu- Detailed 
pation groups occupations 





Elementary school graduation 








$2 ,000 . 263 .129 .134 
$3 ,000 .228 113 .133 
$4,000 .135 .071 .089 
$6 ,000 .048 .028 .035 
High school graduation 
$2 ,000 .237 .140 .163 
$3 ,000 .241 .138 .176 
$4,000 .176 -102 .134 
$6 ,000 .081 .048 .060 
College graduation 
$2 ,000 .261 .182 .218 
$3,000 .340 .218 279 
$4,000 .343 .195 . 260 
$6 ,000 .212 101 .144 





Source: See Table 2. 


vidual within-group differences. Such a procedure, unlike those described to 
this point, requires the analyst to make an assumption that may well be con- 
trary to fact. Yet, under some circumstances he may judge that the assumption 
is sufficiently close to the truth to make the results worth calculating. In this 
respect, the procedure outlined below is analogous to the methods of deriving 
estimates from “ecological regressions” proposed by Goodman [6] and may, 
in fact, be regarded as an extension of those methods. 

The within-group difference (regression) for the ith occupation group is de- 
fined as d;/a;—(b;—d,)/(n;—a,;) =8;. The assumption we wish to consider is 
8;=8w for all «. On this assumption, 

b; b; > d; a; 
+ Bw eS 


Ny Nk — ay Ny 











790 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


TABLE 5. ESTIMATED AND ACTUAL DISTRIBUTION (IN PROPORTIONS) 
BY 1956 INCOME, BY EDUCATION AND MAJOR OCCUPATION GROUP 
(CONDENSED), FOR MALES 35 TO 54 YEARS OLD WHO WERE 
YEAR-ROUND FULL-TIME WORKERS: MARCH 1957 








Less than 4 years high school 4 years high school or more 





Major occupation group Thailer $4,000 $6,000 Under $4,000 $6,000 
$4,000 to 6,000 and over 34,000 to 6,000 and over 





Professional, technical and kindred workers 
Estimated -252 -465 .096 . 257 -647 
Actual* .417) .472) ° -245 -646 


Farmers and farm managers 
Estimated 
Actual* 


Self-employed managers, officials, and pro- 
prietors, except farm 
Estimated 
Actual* 


Salaried managers, officials, and proprie- 
tors, except farm 
Estimated 
Actual* 


Clerical, sales, and kindred workers 
Estimated 
Actual* 


Craftsmen, foremen, and kindred workers 
Estimated 
Actual* 


Operatives and kindred workers 
Estimated ° . ° .189 
Actual* ‘ ‘ . - 187 -589 


All other 
Estimated .664 .316 .477 .321 
Actual* -690 .270 -040 .382 .486 











* Except for figures in parentheses, these proportions were transcribed from Table H, Current Population 
Reports, Population Characteristics, Series P-20, No. 99, “Literacy and Educational Attainment: March 1959” 
(Washington: Bureau of the Census, February 4, 1960); the data are a revision of those originally published in 
Series P-20, No. 77 (December 27, 1957). Figures in parentheses were derived from unpublished tables of the 
Bureau of the Census; since they are based on small samples, they are shown for purposes of illustrative com- 
parison only. 


Hence we define 


b; — d; 
eoV—— 
Ni — a; 
the estimate is accurate if our assumption holds. Similarly, we define 
a; ni — a; 


For example, using these formulas on the sample data mentioned above (the 
U. S. Bureau of the Census kindly made available the unpublished detail 





EDUCATIONAL DIFFERENCES IN INCOME 791 


required for calculating these estimates), we obtain for high school graduates 
working as salaried managers, officials and proprietors an estimate of 0.670 
for the proportion with incomes of $6,000 or more in 1956; the actual propor- 
tion, computed directly from the sample cross-tabulated data is 0.664. For 
men in the same occupation with educational attainment below four years of 
high school, the estimated proportion is 0.489; the actual is 0.511. 

We may even push the procedure a little further in order to estimate more 
details of the education-specific income distribution within each occupation 
group. For example, proceeding as before, we estimate the proportion with in- 
comes of $4,000 or more at 0.930 for high school graduates and 0.743 for others 
in the salaried managers, etc., group. Differencing our estimates for incomes 
of $6,000 or more and $4,000 or more, we obtain estimates of 0.260 and 0.254, 
respectively, for the proportion with incomes between $4,000 and $6,000 at the 
two levels of education within this occupation group, as compared with the 
actual proportion, 0.258 and 0.261. 

Table 5 compares a set of estimates derived in this manner with the actual 
distributions obtained by cross-tabulation. Even if the reader’s impression of 
the degree of approximation obtained is favorable, he must remember, of 
course, that the estimates are no better than the assumption on which they 
rest. The analyst may not care to make the requisite assumption if he is work- 
ing with variables whose interrelations are quite obscure. 

For comparison with results set forth earlier, we may mention that the total 
difference between high school graduates and others in proportions with 1956 
incomes of $6,000 or more is 0.305 and the “between” component is 0.168. 
Cutting the income distribution at $4,000, the total difference is 0.274 and the 
“between” component 0.133. The occupational classification here is even more 
condensed than the classification by major groups. Even so, the “between” 
component remains substantial relative to the total difference. Apparently 
the earlier results are not wholly vitiated by the age heterogeneity of the 
population and the inclusion of “not in labor force” as a quasi-occupation 
group. 

On the substantive side, the findings sketched here argue strongly for a 
systematic consideration of occupation in attempts to interpret income dif- 
ferences by education. Fortunately, it appears that a cross-tabulation of the 
three variables (with age and color controls) will become available as special 
reports from the 1960 census are completed. Where the analyst must work with 
data that are already published, however, he may find the technique described 
here a useful supplement to the standard set of tools for reaching conclusions 
in a roundabout manner. 


REFERENCES 


[1] Duncan, Otis Dudley, “A Socio-Economic Index for All Occupations,” in Occupations 
and Social Status, by Albert J. Reiss, Jr., et al. New York: The Free Press of Glencoe, 
in press. 

[2] Duncan, Otis Dudley, Cuzzort, Ray P.. and Duncan, Beverly, Statistical Geography. 
Glencoe, Illinois: The Free Press, 1961. 

[3] Glick, Paul C., “Educational Attainment and Occupational Advancement,” Trans- 








AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


actions of the Second World Congress of Sociology, Vol. II. London: International 
Sociological Association, 1954, 183-93. 

Glick, Paul C. and Miller, Herman P., “Educational Level and Potential Income,” 
American Sociological Review, 21 (1956), 307-21. 

Goodman, Leo A., “Ecological Regression and Behavior of Individuals,” American 
Sociological Review, 18 (1953), 663-4. 

Goodman, Leo A., “Some Alternatives to Ecological Correlation,” American Journal 
of Sociology, 64 (1959), 610-25. 

Kitagawa, Evelyn M., “Components of a Difference between Two Rates,” Journal 
of the American Statistical Association, 50 (1955), 1168-94. 

Miller, Herman P., “Annual and Lifetime Income in Relation to Education: 1939- 
1959,” American Economic Review, 50 (1960), 962-86. 

Robinson, W. S., “Ecological Correlations and the Behavior of Individuals,” A meri- 
can Sociological Review, 15 (1950), 351-7. 

Walker, Helen M. and Lev, Joseph, Statistical Inference. New York: Henry Holt & 
Company, 1953. 





TESTING THE INDEPENDENCE OF 
REGRESSION DISTURBANCES 


H. Tern ann A. L. Nacar 
Econometric Institute, Netherlands School of Economics 


This article deals with the distribution of the Von Neumann ratio 
of least-squares estimated regression disturbances. This distribution is 
approximated by a beta distribution under the condition that the be- 
haviour of explanatory variables of the regression over time is suf- 
ficiently smooth. Two examples are presented, together with a table 
containing 1 and 5 per cent significance limits for a number of observa- 
tions ranging from 15 to 100 and a number of coefficients adjusted 
ranging from 2 to 6. 


1. INTRODUCTION 
HE possibility of serial correlation of disturbances presents a serious prob- 
lem in time series regression analysis. The most well-known contribution to 
this field is that of Durbin and Watson [1, 2], who formulated a test procedure 
of the null-hypothesis of residual independence based on the Von Neumann 
ratio of the least-squares estimated disturbances :! 


_ LY {am - at - 1}? 
= > a0 , (1.1) 


where @(¢) is the least-squares estimator of the disturbance u(t) in the regres- 
sion equation 


Q 





y(t) = Biri(t) + ++ + Bara(t) + ud). (1.2) 


This contribution is, however, only a partial one, since it does not specify pre- 
cise significance limits but only upper (Qv) and lower (Qz) bounds to these. 
Hence the inference takes the following form: If Q<Qz, the null-hypothesis is 
rejected (at the significance level corresponding to Q,;) in favour of the alterna- 
tive hypothesis of positive serial correlation; if Q > Qu, the null-hypothesis is not 
rejected; and if Q1 <<Q<Qu, no inference is possible. 

Obviously, the “region of ignorance,” i.e., the interval (Qz, Qu), presents a 
great difficulty, especially if it is large. And it is large when the number of ob- 
servations is small or moderate (around 20, say) and the number of explanatory 
variables not very small. This implies an inconvenience for the practical re- 
search worker, for he is then confronted with a situation in which he has no 
guidance; and it implies a danger insofar as the practical worker feels that he 
can interpret “no inference possible” as equivalent with “no need to reject the 
null-hypothesis of independence,” for this will necessarily lead to bias in the 
sense that too many cases of positive serial correlation are overlooked. 





1 Reference is also made to Hannan (3, 4] and to Hildreth and Lu [5]. 

2 The Von Neumann ratio will be interpreted throughout this paper as the sum of squares of the first differ- 
ences of the least-squares estimated disturbances divided by the sum of squares of the estimated disturbances 
themselves. This is not identical with the ratio of the mean square successive difference to the variance of the 
disturbance estimates, the difference being a factor 7'/(7—1) where 7’ is the number of observations. We follow 
Durbin and Watson in this respect because it facilitates slightly the computations. 


793 








94 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


These remarks should in no way be interpreted as a criticism of Durl in’s and 
Watson’s pioneering work. As a matter of fact, the present article (which pre- 
sents unique although approximate significance limits) is based very heavily on 
their results. This paper, too, is based on the Von Neumann ratio of the esti- 
mated disturbances. It is concerned with the derivation of significance limits of 
this ratio for the case in which the first and second differences of the explana- 
tory variables are small in absolute value compared with the range of the cor- 
responding variable itself.* This condition is met satisfactorily for most eco- 
nomic time series, except of course when such a series has already been trans- 
formed on first-difference basis prior to the regression computation; in that 
case our procedure is not recommended. In addition, a convenient though ap- 
proximate method is indicated for computing a revised regression in the case 
when the null-hypothesis is rejected. 


2. TWO EXAMPLES 
We shall deal with regressions of the type 
y(t) = Biti(t) + Bore) + +--+ ud, (2.1) 


where the values taken by the 2’s are assumed nonstochastic (one of them may 
be 1 for all t, so that the corresponding 6 represents then a constant term), while 
the disturbances (the u’s) are supposed to be normally distributed with zero 
mean and constant variance o*. In the null-hypothesis the u’s are assumed to 
be mutually independent. We shall write T for the number of observations, and 
A for the number of coefficients in (2.1)—i.e., the number of explanatory vari- 
ables plus one for the constant term (if there is a constant term). 

The procedure will be illustrated with two numerical examples, one of which 
deals with the demand for spirits in the United Kingdom from 1870 to 1938, the 
other with the demand for textiles in the Netherlands from 1923 to 1939.4 In 
both cases we have annual data (hence 7'=69 for the spirits example, 7 =17 
for the textile example) and in both cases y refers to the logarithm of consump- 
tion per head, z, to the logarithm of real income per head, and 22 to the log- 
arithm of the deflated price of the commodity. Hence §; stands for the income 
elasticity and 8, for the price elasticity of spirits in Britain and of textile in the 
Netherlands in the periods just-mentioned. 

It will prove convenient to follow Durbin’s and Watson’s example by com- 
puting not only the sums of squares and products of the variables themselves 
but also those of their first (backward) differences, like Ay(t)=y(t)—y(t—1). 
They are shown in Table 1. This procedure has three advantages. First, it 
leads to an easy computation of the Von Neumann ratio as will become clear 
at the end of this section. Second, it enables us to judge to what extent the ap- 
proximations to be applied in Section 3 are acceptable. Third, it is computa- 











* It is worth-while to add that this condition is in most cases particularly acceptable when the observations 
are arranged, not in chronological order, but according to increasing values of the explanatory variable (assuming 
that there is only one such variable), The use of Q in this manner amounts to a test for linearity of the regression. 
Reference is made to Prais and Houthakker [7, p. 53]. 

4 The spirits example was also used by Durbin and Watson for illustrative purposes; it is due to Prest [8]. For 
the underlying time series we refer to Table 1, p. 160 of reference [2] and call the attention to a printing error in 
the income column: the last observation should be 2.1182. The time series of the textile example are given in the 
Appendix of this paper. 



















INDEPENDENCE OF REGRESSION DISTURBANCES 795 


TABLE 1. SUMS OF SQUARES AND PRODUCTS 











] a1 q2 | 1 Ay | An | Az | 1 | 








Spirits 





y 221.262 982 | 238.367 612 | 255.009 106 | 122.1562 | .112 627 | .014 683 —.076 413 | —.6844 | Ay 
Z 266.286 023,| 287.819 282 | 135.3888 -023 539 -000 527 -3513 | An 
Zn 312.604 832 | 146.1679 -083 559 -5404 | Az 
1 69 68 1 






















Textile 
y 76.658 975 72.596 420 67.441 918 86.0763 | .024 866 | .001 829 | —.018 766 .2223 | Ay 
2 68.841 788 64.064 566 34. 2078 -002 930 -001 562 -0308 | An 
Ze 59.759 499 31.8339 -021 104 | —.2169 | Am 
1 





























1 17 16 





tionally convenient for the procedure which will be proposed in Section 4.2 in 
case the null-hypothesis of residual independence is rejected. 

* The elasticities are then estimated by solving the normal equations in the 
conventional manner, which gives: 









(by = — 0.120 _ (= 1.143 
spirits { textile { (2.2) 
be = — 1.228 be = — 0.829. 


Next, we consider the sums of squares of the least-squares estimated disturb- 
ances: 













spirits: >> a(t)? = 0.22096; _ textile: > a(t)? = 0.002567, (2.3) 





which enable us to compute standard errors of the estimated elasticities under 
the condition that the null-hypothesis is true. They are obtained in the usual 
manner, which gives: 

$s = 0.108 


8b, = 0.050 


8», = 0.156 
textile | " (2.4) 


spirits 
’ { 8, = 0.036. 


Finally, we consider the first differences of the estimated disturbances, whose 
sum of squares is 


D {Aaa}? = do {ays}? — 2 db DO Ax(ay) 


t ‘ 


+ Do Dd dads DO Ax (1) Azy-(t). (2.5) 


Using the right-hand part of Table 1, we find 
spirits: >> { Aa(t)}? = 0.05497; textile: )> { Aa(t)}? = 0.004943. (2.6) 


The Von Neumann ratio of the least-squares estimated disturbances is then 







Q 






= 0.249 for spirits; 
(2.7) 


= 1.926 for textile, 


796 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


which is the statistic which will be used for testing the null-hypothesis of 
residual independence. The reader who is interested in results only is advised to 
proceed immediately to Section 4.° 


8. THE APPROXIMATE DISTRIBUTION OF THE VON NEUMANN RATIO OF 
LEAST-SQUARES ESTIMATED REGRESSION DISTURBANCES 


3.1. INTRODUCTORY 


It will prove convenient to arrange the values taken by our variables in vec- 
tors and matrices. Thus, we write y for the vector of 7’ values taken by the de- 
pendent variable and X for the 7 X A matrix of values taken by the explanatory 
variables. Further, we write u for the column vector of the 7 “true” disturb- 
ances, and @ for its least-squares estimator. Then, as is well-known (see e.g. 
[10, p. 211]): 


a@ = [I — X(X'X)—X" Ju, (3.1) 
where I is the unit matrix of order 7’; and 
t' Aa 
= (3.2) 

nh 


where A is the symmetric T XT matrix 


i. 2 
—!] 
0 


0 
i. O 





-|] 


2 


=] 


0 
0 


ee 
~~ Per 


oO ees 


B.+.0« 
Besa 


2 


—1 


—1 





It is well-known that Q lies between 0 and 4, but it is less generally realized 
that these limits cannot always be attained if Q refers to least-squares estimated 
disturbances. Take e.g. A=2, one of the explanatory variables representing the 
constant term. Then Q=0 would imply that all disturbances are equal (to «, 
say) and the observed points of the two-dimensional scatter are all located on a 
straight line parallel to the regression line; which is impossible because the least- 
squares procedure implies that the regression line is then automatically shifted 
upward by an amount e. In fact, the attainable limits of Q are determined by 
the roots of the matrix 


K = A — AX(X'X)-1X". (3.4) 


This matrix is positive semi-definite with A zero roots. The minimum of Q is— 
as stated ky Durbin and Watson—the smallest of the 7’ —A positive roots, the 
maximum is equal to the largest root. Let us indicate the positive roots by 





§ Except that he is also advised to check whether it is indeed true that the behaviour of the explanatory variables 
is sufficiently smooth in the sense that their first and second differences are small compared with the range of the 
corresponding variable itself. This can be done by comparing the second moments of the first differences with the 
moments about the mean of the corresponding original variables. 





INDEPENDENCE OF REGRESSION DISTURBANCES 797 


"1, ¥2, ° + ; then the following result is available for the expectation of Q under 
the null-hypothesis stated :* 


1 
w(Q) = a re = J, say; (3.5) 


and for the next three moments about the mean: 
f 2>) (vg — 5)? ; 
me” (T — A(T — A+2)' 
8>- (v; — 5) 
~ (T—-A(T—-A+2(T—-A+4)’ 
iv 48 >> (vi — 9)4 + 12{ >> (% — )?}? 
(P= A(T — A+ 2)(T—A+4)(T—A+6) 


These results are exact and will be used in the next sections (3.2--3.5) to fit a 
beta distribution to the true distribution of Q under certain approximations. 








M3 








3.2, THE MEAN 


According to (3.5), the expectation of Q is 1/(7—A) times the sum of the 
roots of K, or what amounts to the same thing, 1/(7’— A) times the trace of K 
(the sum of the diagonal elements of K). Applying some elementary properties 


of traces of matrices, we find 
irK = tr A — tr AX(X’X)"1X’ (3.7) 
= (2T — 2) — tr (X’X)X’AX, 


see (3.3). Now X’AX is the matrix of sums of squares and products of the first 
differences of the explanatory variables: 


X'AX = | u an(an(t |, (3.8) 


and these sums of squares and products are usually on the average small com- 
pared with the corresponding elements of X’X. This leads to the presumption 
that we might be able to neglect the trace term behind the second equality sign 
of (3.7) relative to (27 —2). In fact, for our two examples of Section 2 we have 


tr K = (138 — 2) — 0.122 for spirits; 


3.9 
(34 — 2) — 0.518 for textile, ied 


which shows that the error committed by neglecting the trace term amounts 
to less than one-tenth of one per cent in the spirits example, and about 14 per 
cent in the textile example. Generally, one should expect this feature when the 
behaviour of the explanatory variables over time is sufficiently smooth. The 
fact that the error is larger in the latter example is primarily due to the rather 





* These results are derived by using the fact that under the null-hypothesis Q is stochastically independent 
of its own denominator [1, pp. 418-9], so that any moment of Q is equal to the ratio of the corresponding moments 
of numerator and denominator. The latter moments are derived by means of the x*distribution. 








798 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


small range of variation of per capita income in the Netherlands during the 
period considered. 

An alternative interpretation of the trace term is also instructive. The 
matrix (X’X)—X’AX can be interpreted as the matrix of least-squares regres- 
sion coefficients in a set of regressions in which X represents the values taken 
by the independent variables in each of the equations, and the successive col- 
umns of AX the values taken by the various dependent variables. Now AX is 
essentially the matrix of second differences of the X-variables, apart from sign 
and from terminal effects. More precisely: 


r a,(1) — 2,(2) 7 ff —Axn(2)7 
—2(1) + 22 (2) — x(3) — A*x(3) 


—2,(T — 2) + 2x(T — 1) — 2,(T) —A*,(T) 
a —x2(T — 1) + 2,(T) a] | Ax(T) LI 


where A=1, +--+, A. Hence (X’X)—X’AX is minus the coefficient matrix of 
the regressions of the second differences of the explanatory variables on these 
variables themselves (apart from terminal effects), so that the trace of that 
matrix is simply the sum of the “own” coefficients—relating A’z, with 2), for 
a ee 

We suggest to neglect the trace term in (3.7), because this simplifies the ap- 
proach considerably. On combining this with (3.5), we obtain 

T-1 


EQ = 2-7 “ (3.11) 














3.3 THE VARIANCE 
. « a : - : 
According to (3.6), the numerator of the variance of Q equals >> (v;—3) 
apart from a factor 2, which can also be written > (T—A)*. On applying 
(3.5), we therefore have 


oH @%-») = Dv (T— a(£Q)’, (3.12) 


see Section 3.2. Considering }>y?, we observe that the square of a latent root of 
a matrix K is also a latent root of K*, so +»? is equal to the trace of K*. For 
this trace we have 


=i K =t A —tr A X(X’X) X’ 
— tr AX(X'X)“1X'A + tr AX(X’X)“1X’AX(X'X)-1X’, 


which can also be written in the form: 


> v5 = (67 — 8) — 2 tr(X'X) X’A'X + tr[X’AX(X'X) ]', (8.13) 


given the fact that tr A*, being the sum of squares of all elements of A, is 
equal to 67'—8. 
Considering the second term on the right of (3.13), we note that X’A?X 





INDEPENDENCE OF REGRESSION DISTURBANCES 799 


=(AX)’AX is the matrix of the sums of squares and products of the second 
differences of the explanatory variables apart from terminal effects; see (3.10). 
We should expect that the elements of this matrix are small compared with 
the corresponding elements of X’X when the behaviour of our explanatory 
variables is sufficiently smooth, so that the first trace term on the right of 
(3.13) is then small. The same thing can be expected to be true under the 
same condition for the second trace term. For our two examples the decom- 
position (3.13) is as follows: 


> »; = (414 — 8) — 0.378 + 0.015 for spirits; 


| (3.14) 
= (102 — 8) — 1.450 + 0.161 for textile, 


which shows that the error caused by neglecting the last two terms amounts to 
less than one-tenth of one per cent in the spirits case, and less than 14 per 
cent in the textile case. We can go on in this way by combining the approxima- 
tion for 3, and vi to find the corresponding approximation error in Sv 
—7)’; this is still on the small side, viz., about one-tenth of one per cent for the 
spirits case, but slightly larger for that of textile, viz., 5 per cent. 

Our suggestion is to approximate >>»? by the first term on the right of 
(3.13) when the behaviour of the explanatory variables is smooth, and to 
combine this with the approximation of Section 3.2 to derive the variance. This 
gives: 


T 2 
6F —8 —(T—i (: _ ) 
_ T? — 3AT + 4A — 2 





var Q ~ 2: —_____________"_ = 4. - . (3.15) 
(T — A(T — A +2) (T — A)(T — A+ 2) 
3.4 SKEWNESS AND KURTOSIS 
For the moments of the third and fourth order we can proceed in the same 
way. By neglecting tr (X'X)—X’A®X, etc., we find for the third moment of Q: 
32(A — 1)(T? — 5AT + 8A — 4) 


~~ (P-AXT—-A+2(7T—-A+4)’ o.0e 





and, when dividing this by the cube of the standard deviation, we obtain the 
Pearsonian skewness coefficient’ 


i “ aed oe 1/2 _ 
4(A — 1)(T? — 5AT + 8A — 4)(T — A+ 2) 1s = Snaee eeele 


Ve ~ ~~ aad)? BAT F 4A DG ~~ qa 





where the last ratio retains only the leading term (of order T-*). In the same 
way, by neglecting such terms as tr (X’X)X’A‘X, we obtain an expression 
for wy involving T and A only. After dividing this fourth moment by the square 
of the variance, we obtain the Pearsonian kurtosis coefficient : 


6 
eid” Sy (3.18) 


terms of higher order of smallness than 1/7 being neglected. 





1 The Pearsonian coefficients 8: and 82 should not be confused with the coefficient vector 8 of the regression 
problem. The former coefficients play a role in Sections 3.4 and 3.5 only. 








800 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


The results (3.17) and (3.18) imply that the skewness coefficient of the Von 
Neumann ratio of the estimated regression disturbances deviates from zero 
(the symmetry value) to the order 1/7"4, and that the kurtosis coefficient 
deviates from 3 (the value of mesokurtosis) to the order 1/7; at least, when it 
is true that successive differences in the values taken by the explanatory vari- 
ables are sufficiently small. 


3.5 A BETA APPROXIMATION 

The results obtained so far will be used to fit a beta distribution (with range 
to be determined) to the distribution of Q; that is, we shall determine the 
parameters of the beta distribution and its range by identifying its moments 
with those of the Q-distribution to a satisfactory order of approximation. As 
is well-known, a beta variable in the range (0, 1) has the following density 
function: 

1 
—— +72] _ z)**. 


B(a, 
and its mean and variance are 


a ab 
Ex = ——;; var = = ) 
(a + b)*(a + 6+ 1) 





a+b’ 


respectively, while the skewness and kurtosis coefficients are given by 


2(b — a)(a+b+ 1)!” 


Pen, I i nit it al 
~ (a + b + 2)(ab)!/? 
— 6[(a + b+ 1)(a — b)? — (a+b + 2)ab] 
ete ab(a +b + 2)(a+b + 3) 
The last two coefficients are independent of origin and scale and thus inde- 
pendent, of the range that will be adjusted. Hence we should identify the right- 
hand sides of (3.20) and (3.21) with those of (3.17) and (3.18), respectively, 
which gives 
a = $(7' + A); b= 34(T — A+ 2), (3.22) 
terms of higher order of smallness than T°=1 being neglected in the right-hand 
sides of (3.22). On substituting (3.22) into (3.20) and (3.21), we find that the 
skewness and kurtosis coefficients of the two distributions are identical to order 
T—4 and T-, respectively. 
The range of our beta distribution will be written (c, 4—d), which implies 
that Q is transformed to a beta variable 
Q-—c 
z= ——————_. 
4—(c+d) 


in the range (0, 1) The mean is then 


(3.23) 


= EQ-— ec renee T+aA 
4—-(e+d) a+b 2AT+1) 





Ex 





INDEPENDENCE OF REGRESSION DISTURBANCES 
in accordance with (3.19) and (3.22). In the same way, the variance is 

var Q ab T? + 27 — A?+ 2A 
var rz = = = 


{4—(c+d)}*? (a+b)*%a+b+1) 4(7+1)*X(T +2) 


By substituting ZQ as specified in (3.11) and var Q as specified in (3.15), we 
obtain the following solutions for c and d: 





> (3.25) 


4A?—1 3 
c = ——— and d=—,; (3.26) 
T? T? 


terms of higher order of smallness than 1/7? being neglected. By substituting 
(3.26) into (3.24) and (3.25) we find that the means coincide to the order 
1/T? and the variances to the order 1/7*.8 

Summarizing, we have adjusted a beta distribution to the distribution of the 
Von Neumann ratio of the least-squares estimated regression disturbances 
under the condition that the first and higher order differences in the explana- 
tory variables are small compared with the range of the corresponding variables 
themselves. The approximation is such that the means coincide to the second 
order; that is, whereas the asymptotic mean is 2 and hence 0(T°) =0(1), the 
means coincide to 0(7-*).The variances, too, coincide to the second order: 
the asymptotic sampling variance is 4/7 and the two variances have identical 
terms to 0(7-*). The skewness coefficients are identical to the one-and-a-halfth 
order: the asymptotic value of +/8; is zero, and the two coefficients coincide to 
0(7-). The kurtosis coefficients, finally, coincide to the first order, since the 
asymptotic value of 82 is 3 while the two {2's coincide to 0(7-"). Hence the ac- 
curacy of the approximation diminishes for higher-order moments; which is as 
it should be, for the lower-order moments are the more basic determinants of 
the distribution. 


4. THE PROCEDURE RECOMMENDED 
4.1. TESTING THE NULL-HYPOTHESIS 


Table 2 contains the 1 and 5 per cent significance points of Q for testing 
the null-hypothesis of residual independence against the alternative hypoth- 
esis that successive disturbances are positively correlated.* The test is thus 
of the “one-sided” type. The number of coefficients adjusted (A) is 2,3, ---,6 
where A=2 refers either to one explanatory variable plus a constant term, or 
to two explanatory variables in case the regression is fitted through the origin; 
similarly for higher A-values. The number of observations varies from 15 to 





* A direct approximation of the range of Q consists of approximating the (A+1)st (i.e., the smallest positive) 
root and the 7'th root of K by those of A, see Section 3.1. The roots of A are (2 sin jx/27')? where j =0,* ++, T—1; 
see [9, p. 611]. Hence these roots are r*A2/7? and 4—x?/T?, respectively, to order 1/7*, which differs slightly from 
the method developed in the text, the latter method being simpler computationally. 

* For computing the significance points of Q we use the relation between the beta variate z and Q given in 


Section 3.5, viz., 
4a? —1 4A2+2 
Q= +[4- )« 
7? ™ 


Using the tables of Incomplete Beta Function [6] we obtain the values of z corresponding to P =.05 and .01 by 
inverse interpolation for even values of 7’ +A; those for odd values were obtained by linear interpolation between 
adjacent values of 7’, given A. The computations were carried out on the electronic computers of BULL-Nederland 
N.V. at Amsterdam, and we gratefully acknowledge the help given to us by Dr. J. Berghuis. 








802 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


TABLE 2. SIGNIFICANCE POINTS OF THE VON NEUMANN RATIO 
OF LEAST-SQUARES ESTIMATED REGRESSION DISTURBANCES 





A (Number of coefficients adjusted) 





% (Number ‘ . 5 
of observations) 





sy 





tus Gud Ons fxd pms 
con mS cv 

“II 1 ® & | 
te oro 2 & | 














o 
Q | 
~I 


_ 
a 
i 9) 





2NwNWPDh | 
ora 


bo 
© 


oO 


wo mh Ww bw 


em OO he 





nDnwn rp 
Coke Whe 
aN 
= & Or orm 


‘ 
9 
-« 








2 Ww 


“J =] J 
w 


— Ot 
ww w 


Se te 





— See | 
“J -3 -7 | 
bt bo hk db bo 


~ 


“J 


lal al-all-all Uedl-adl-adl-adl-an 





“J 


men ene 
rn Or WwW bo 
Js sJ J sJ J 


on 
S. 





rae es 


i? 9) 


oor or 
N 


uo 
— 





ooo ft F&F WW WN WN Wb bo 


> 
~ 
— 
i ee ee 


i ne ee ee ee 

ee tt 

i ee a 
or 

ee ee ee ee ee 


mPwWWN NK © 
ll el el ee ee | 
~JI~J ~J ~1 67 


7) 


ss 


Se es ee | 
~I 
a5 054 hh be = 
we a a eee ee ee ee 


> 
o 

















INDEPENDENCE OF REGRESSION DISTURBANCES 803 


100, but from 40 onwards they are only specified for intervals of 5. When 
T > 100, a normal approximation is sufficiently close; one can use 


gu T-} 
Pes 4.1) 
a 2//T+2 — ; 


which is approximately a normal variate with zero mean and unit variance 
under the null-hypothesis; it is based on the mean (3.11) and on the variance 
(3.15) to 0(7-*). The corresponding significance points are 


t: Q 2 (; — 1 2.32635 ) 
er cent: a _ ———— 
“ T-A VT+2 


T-1 1.64485 
5 per cent: Q = 2( ~ sys) 
T-A VT+2 





which differ from the values of Table 2 for T=90 only in the third place be- 
hind the decimal point. This result shows also that the limit of the significance 
points is 2, that the convergence to this limit takes place with speed 1/./T, 
and that the effect of different numbers of coefficients adjusted is of the order 
1/T. 

A comparison of the table for A=3 and 7 =70 and 17 with the numerical 
values obtained in (2.7) shows that the null-hypothesis should be rejected for 
the spirits example and not rejected for the textile example, in both cases 
both at the 1 and the 5 per cent level. A comparison of Table 2 with the tables 
prepared by Durbin and Watson shows that our results are very close to their 
upper limits Qy. This is not surprising, for these limits are attained when the 
vectors of values taken by the explanatory variables are all characteristic vec- 
tors of A as defined in (3.3) corresponding to a zero root; which simply im- 
plies that the first differences of these variables should vanish.!° That their 
tables are not identical with ours is due to the fact that different approxima- 
tions were used; we do not wish to claim that our table is more accurate than 
theirs. 


4.2 WHAT TO DO WHEN THE NULL-HYPOTHESIS IS REJECTED 


The simplest alternative hypothesis to the null-hypothesis is that of a first- 
order Markov scheme: 


u(t) = pult— 1) +e where 0 <p <1, (4.2) 


the e, being mutually independent normal deviates with constant variance. The 
parameter p is the first-order autocorrelation of the disturbances and of course 
unknown. But if we decide to reject the null-hypothesis and to re-estimate the 
regression coefficient by taking account of the autocorrelation, we can estimate 
p conveniently by means of Q. For Q can be regarded as an estimator of 





10 This could be used as an argument to use the upper limits immediately; but it is of course more satisfactory 
to indicate precisely the nature of the approximation errors involved. 








AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


var (u — u_) 
——_— = (1 - p), 
var u 


_ 


hence we could take 6=1—43Q. But we can proceed in a slightly more refined 
manner by using the limits c and 4—d as specified in (3.26), which indicate that 
Q is confined to a smaller range than (0, 4) even if the null-hypothesis is true. 
We shall define # as a decreasing linear function of Q which takes the value 1 
when Q=c and the value —1 when Q=4—d. This gives 
. 1942-Q + 24'-—32 
Fah SE Teena 


2° —3ae—-1 
which is virtually equivalent with 
__ TL 4Q) +89 
p = . 
T? — A? 





(4.3) 


On applying (4.3) to the spirits example [see (2.7) ], we obtain 6=0.879. Let 
us then write the regression equation 
y(t) = a+ Biti(t) + Bor2(t) + ul) 
in the form 
y(t) — py(t — 1) = a1 — p) + Bifailt) — pai(t — 1)} 


4.4 
+ B2{ x2(t) — pr2(t — 1)} + { u(t) — pu(t — 1}, es 


which is obtained by subtracting p times the regression equation lagged one 
year. On combining (4.4) with (4.2), we see that the disturbances of (4.4) are 
the e’s of (4.2), so that the autocorrelation difficulty is now removed. We can 
also write (4.4) in the form: 


(1 — p)y(t) + pAy(t) = a(1 — p) + Bi{(1 — p)ai(t) + pAzi(t)} 
+ Bo{ (1 — p)xa(t) + pAzxe(t)} + e, 


which shows that least-squares estimation requires such sums of squares and 
products as 


dX {1 — py) + pay} {1 — p)ai) + pAzi(t)} 


= (1 — p)? > y(ailt) + o(1 — p) > y() Arild) 
+ p(l — p) Dd) xi(t)Ay() + p? D> Ay(t)Axi(0). 


Given p, these sums of squares and products can be computed by using Table 1, 
which contains the moments of the original variables and of their first differ- 
ences, and Table 3, which contains such “mixed” moments as > y(t)Az;(t) 
involving both first differences and the original variables. Using 6=0.879 in- 
stead of p, we find for the estimated income elasticity 0.312 and for the es- 
timated price elasticity —1.123, which are indeed more plausible results than 
(2.2); the corresponding standard errors are 0.165 and 0.077, respectively. 





INDEPENDENCE OF REGRESSION DISTURBANCES 805 


TABLE 3. ADDITIONAL SUMS OF PRODUCTS FOR THE SPIRITS EXAMPLE 











Yu v1 Z2 





—1.048 513 —1.359 344 —1.554 634 
0.611 650 0.694 187 0.761 012 
0.853 259 1.057 843 1.224 067 





It is to be noted that this procedure involves two different kinds of approxi- 
mations. The first is the obvious approximation p=, which neglects the sam- 
pling variability of 6. The second is somewhat less obvious but nonetheless 
fundamental: we use our data for three purposes, viz., to test the inde- 
pendence of the disturbances, to estimate the dependence in case the hypothesis 
is rejected, and to re-estimate the regression coefficients in that same case. It 
is clear that this procedure violates the classical set-up of regression theory and 
that it is safe to regard the standard errors just-mentioned as understating 
the unreliability of the point estimates, at least on the average. 


APPENDIX 
TABLE 4. TIME SERIES FOR THE TEXTILE EXAMPLE 





y ZT Ze 





.99651 
.99564 
.00000 
.04766 
.08707 
.07041 
2.08314 
2.13354 
2.18808 
2.18639 
2.20003 
2.14799 
2.13418 
2.22531 
2.18837 
2.17319 
.21880 


-98543 
-99167 
-00000 
.02078 
-02078 
-03941 
.04454 
-05038 
-03862 
.02243 
-00732 
-97955 
-98408 
-98945 
.01030 
-00689 
-01620 


.00432 
.00043 
-00000 
-95713 
-93702 
-95279 
.95713 
-91803 
.84572 
.81558 
- 78746 
-79588 
-80346 
.72099 
.77597 
77452 
- 78746 


NwNN eK 


to bt 





et OD ON ON 


NNN Re RK wd bd 














Source: y =logarithm of per capita consumption of textile, obtained by dividing the money value of textile con- 
sumption by family households (Statistische en econometrische onderzoekingen, 4, 1949, No. 3, 136-9) 
by pN; 

p =retail price index of clothing for the city of Amsterdam (internal data of the Bureau van Statistiek der 
Gemeente Amsterdam) ; 

N =population of the Netherlands (Central Bureau of Statistics, Bevolking van Nederland naar leeftijd 
en geslacht, 1900-1962, 1953, pp. 12-18); 

21 =logarithm of real per capita income, obtained by dividing the money value of income of family house- 
holds (Statistische en econometrische onderzoekingen, 4, 1949, No. 3, 102-3) by #N; 

x =general retail price index (Maandschrift van het Centraal Bureau voor de Statistiek) ; 

2: =logarithm of the deflated price index of clothing, i.e., of the ratio p/. 








AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 10961 
REFERENCES 

Durbin, J. and Watson, G. S., “Testing for Serial Correlation in Least Squares 
Regression,” I, Biometrika, 37 (1950), 409-28. 
Durbin, J. and Watson, G. S., “Testing for Serial Correlation in Least Squares 
Regression,” II. Biometrika, 38 (1951), 159-78. 
Hannan, E. J., “Exact Tests for Serial Correlation,” Biometrika, 42 (1955), 133-42. 
Hannan, E. J., “An Exact Test for Serial Correlation in Time Series,” Biometrika, 
42 (1955), 316-26. 

5] Hildreth, C. and Lu, J. Y., Demand Relations with Autocorrelated Disturbances. 
Technical Bulletin No. 276, Agricultural Experiment Station, Michigan State Uni- 
versity, 1960. 

Pearson, K., Tables of the Incomplete Beta Function. Cambridge, 1948. 
Prais, 8. J. and Houthakker, H.S., The Analysis of Family Budgets. Cambridge, 1955. 
Prest, A. R., “Some Experiments in Demand Analysis.” The Review of Economics and 
Statistics, 31 (1949), 33-49. 
Ruben, H., “Probability Content of Regions Under Spherical Normal Distributions,” 
I. The Annals of Mathematical Statistics, 31 (1960), 598-618. 

{10] Theil, H., Economic Forecasts and Policy. Second Edition. Amsterdam, 1961. 





RECTIFYING INSPECTION OF LOTS* 


F. J. ANSCOMBE 
Princeton University and University of Chicago 


The problem discussed is how to choose a plan of rectifying inspec- 
tion for a lot or sequence of lots. The purpose of the inspection is mini- 
mization of total coste. Three types of inspection plan are considered, 
namely, no inspection at all, 100 per cent inspection, and sampling 
inspection according to a sequential plan with linear boundary. Under 
certain assumptions, it is shown that a near-optimum plan can easily 
be found. When a single cost factor k, defining the break-even quality, 
has been determined, the plan may be written down at once, according 
to a procedure given at the end of §3. It is suggested that for practical 
purposes there would be little to be gained by refining the procedure, 
because of vagueness in the data of the problem. The sequential plan 
is compared with single sampling. A subtle economic and legal prob- 
lem concerned with the testing of electric meters is discussed at 
some length. This paper is similar in purpose to a previous paper [1] 
dealing with inspection of a continuous output. 


1. INTRODUCTION 


N a previous paper [1] I have considered rectifying inspection of a con- 
| tinuous output, from the point of view of minimizing total costs. For linear 
cost functions, and on the supposition that deferred sentencing methods were 
not practicable, it was shown that very simple rules could be given for the 
choice of a near-optimum inspection plan, which would consist possibly of no 
inspection at all, or possibly of 100 per cent inspection, or possibly of sampling 
inspection according to a Dodge plan with specified parameters.! 

The purpose of the present paper is to provide similar very simple rules for 
the choice of a near-optimum plan for rectifying inspection of a lot. This brings 
to fruition a line of thought previously followed rather tentatively (in reference 
[23] and the first two references listed in reference [1]). For discussion of aims, 
and references to the literature of economic analysis of inspection, see refer- 
ence [1]. 

A short supplementary bibliography of economic analysis of inspection is 
given at the end of this paper. Particularly noteworthy in the present context 
is the distinction drawn clearly and persuasively by Schlaifer [15] between 
two types of inspection problem, with reference to the independence or inter- 
dependence of successive decisions. In reality, no inspection problem is alto- 
gether static. The quality aimed for depends to some extent on the quality 
that has recently been achieved. Competition, production methods and in- 
spection resources are liable to change. The inspection decision that is wisest 
regarding any particular lot or quantity of product is not wholly independent 





* Research carried out in part at Princeton University and in part at the Department of Statistics, University 
of Chicago, under sponsorship of the Logistics and Mathematical Statistics Branch, Office of Naval Research. Re- 
production in whole or in part is permitted for any purpose of the United States Government. 

The paper was read at the Central Region Meeting of the Institute of Mathematical Statistics, Lafayette, 
April 9, 1960. 

1 I regret the error in the second displayed equation of §8, p. 712, where the factor 100 should be replaced by 1; 
see Corrigenda, Vol. 54, p. 810. A reader interested in deferred sentencing may care to refer to the recent paper by 
Hill, Horsnell and Warner [25]. 


807 





808 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


of recent decisions and future policy. However, as a first approximation it is 
reasonable to contrast the following two sorts of inspection: 

(a) Final inspection of a product by the producer. This is often rectifying or cor- 
rective, aiming to improve the quality if poor by correcting or replacing defectives; 
but it may incorporate the possibility of “rejection,” i.e., downgrading or scrapping 
product that appears to be of poor quality. 


(b) Acceptance inspection of a product by a would-be buyer, who decides whether 
to buy or not. 


The contrast is this, that in (a) the number of lots (or quantity of output) 
inspected is independent of the decisions made, being the whole of the output, 
while in (b) what is predetermined is not normally the quantity of product in- 
spected, but rather the quantity bought. That is, the buyer has certain needs 
and he will try if possible to fulfill them; the more lots he declines, the more lots 
he must inspect in order to buy the requisite number. In case (b) there is an 
obvious interdependence between inspection decisions. Alternative inspection 
plans can be compared by evaluating the average total cost per lot accepted, 
not the average total cost per lot inspected. The cost of rejecting a lot depends 
not only on the quality of the lot itself but also on how many other lots are re- 
jected. In case (a), on the other hand, it may be reasonable to ignore any such 
interdependence, and the appropriate costs to consider are costs per lot in- 
spected. One can think of conditions where this would not be so. If the pro- 
ducer overproduces, it may be wise for him to try to sell only his output of 
best quality, irrespective of the quality of his less good output. If his inspection 
resources are strained, it may be wise for him to concentrate on rectifying the 
worst lots, leaving the less bad lots for later treatment if time allows. But in 
more normal circumstances (at any rate, one hopes they are more normal), the 
best inspection decision will depend on the available evidence concerning the 
quality of the product being inspected, but not directly on other inspection de- 
cisions. Each decision can be considered separately. 

Rectifying inspection in case (a), with independent decisions, is the topic of 
the present paper and of [1]. At a certain modest level of sophistication—only 
the simplest linear cost assumptions are made, and inspection plans defined by 
no more than two parameters are considered—these two papers would seem to 
provide a complete answer to the practical problem of choosing plans for 
rectifying inspection. 


2. FORMULATION OF THE INSPECTION PROBLEM 


We have to choose an inspection procedure, to be applied to a lot of articles 
that is presented for inspection. Usually, but not necessarily, the lot is a mem- 
ber of a sequence, and the same procedure will be applied to all the lots. In 
that case, the purpose of the inspection will be to minimize the long-run average 
of total cost, consisting of the inspection cost proper plus the ultimate losses 
resulting from passing bad material. If an isolated lot is under consideration, 
the purpose of the inspection will be to minimize some sort of expectation of the 
total cost. 

The following assumptions will be made. Any articles that are inspected will 
be classified as either “correct” or “defective.” Inspection, when performed, is 





RECTIFYING INSPECTION OF LOTS 809 


always performed correctly, so that articles are never misclassified. Defectives 
cannot be picked out at a glance, but only by systematically performing the 
operation of inspection on all the articles presented. Sampling is strictly ran- 
dom. The cost of inspecting the articles is proportional to the number inspected, 
and is the same whether the articles are sampled in groups or one by one. When 
a defective is found during the inspection, it is immediately rectified or re- 
placed by a correct article, the cost of doing this being the same each time. If 
a part of the lot is not inspected, each defective in it will cause an ultimate loss, 
the same for each. 
We shall use the following notation. 


Total number of articles in the lot = N. 

Total number of defectives in the lot before inspection = Y. 

Cost of inspecting any proportion & of the lot (where 0<£<1)=k¢é cost- 
units. 

Cost of replacing or rectifying a defective article found during the in- 
spection =a cost-units. 

Ultimate expected loss due to passing a defective = 1+-a cost-units. 


Thus the difference between the loss caused by passing a defective and the 
cost of replacing or rectifying the defective during inspection will be taken as 
the unit of cost. The cost of inspecting the lot 100 per cent is k times that 
amount. This choice of cost-unit is convenient for immediate purposes, al- 
though it does have a drawback. The ultimate loss resulting from passing a 
defective will often be the component of the economic situation that is least 
surely known or most open to dispute, and then the cost-unit itself will be 
disputable. The matter is discussed below at the appropriate place. For the 
present, it will be supposed that the lot size N and the cost coefficient k are 
known. Y will of course normally be unknown. The value of a is irrelevant to 
the choice of inspection plan, and it will be convenient to suppose a=0. 


3. CHOICE OF INSPECTION PLAN 


If we knew for sure that Y was less than k (being otherwise unknown), the 
best plan would be not to inspect at all. If we knew that Y was greater than k, 
the best plan would be to inspect the whole lot.? If we do not have such knowl- 
edge, and consider that values of Y less than k and also greater than k must be 
reckoned with, we should try to choose a sampling plan such that the expected 
sample size is small if in fact Y <k and large if in fact Y>k. 

Suppose articles are sampled and inspected one by one. Let y denote the 
total number of defectives found when a proportion & of the lot has been in- 
spected. The sequence of points (£, y) forms a sample path. If the whole lot is 
inspected, the sample path necessarily terminates at (1, Y). Everywhere, y is 
a nonnegative integer and £ is an integer multiple of 1/N. 





? This statement is not quite true without some qualification. If the value of Y were known exactly in advance, 
there would be an optimum sequential inspection plan which would not always reduce to either no inspection or 100 
per cent inspection; this has been shown in the second paper listed in [1]. However, any such certain knowledge of Y 
would be most extraordinary in practice, and for practical purposes the statement that if Y were known to be greater 
than k the best plan would be to inspect 100 per cent is near enough correct. The same comment applies below at 
(5) where the regret is defined. 





810 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


Consider a sequential stopping rule defined by a boundary (set of boundary 
points) such that if the sample path reaches any boundary point inspection 
ceases. Presumably the best such boundary is not far from the straight line 
joining (0, 0) to (1, &). For simplicity, let us consider a linear boundary con- 
sisting of the points (a, 0), (a+, 1), (a+28, 2), ---, that is, the points 
(a+y8, y), where 0<y<[(1—a)/@], the integer part of (1—a)/8. Here a and 
8 are constants,* integer multiples of 1/N. The boundary will be parallel to the 
line joining (0, 0) to (1, k) if B=1/k. We shall in fact suggest that 8 be chosen 
equal to 1/k, but first let us examine the properties of a linear boundary with 
arbitrary a and 8. 

We shall suppose N very large relative to Y, so that may be treated as a 
continuous variable. This will simplify the expressions a little, without altering 
the essential character of the problem. 

For fixed Y, under random sampling, the chance f,,y that the sample path 
will terminate at any boundary point (a+y8, y) was found in reference [23]; 
see also Plackett [26]. We have 


Y 
fly = (") a(a + yB)*""(1 — a — yB)?-" (1) 


for 0<y< min (Y, [(1—~)/s]). If Y<[(1—a)/8], the sample path will neces- 
sarily meet the boundary, and so >-,f,,y =1. But if Y>[(1—a)/], there is the 
chance 1—)-, f,,y that the sample path will not meet the boundary, but ter- 
minate at (1, Y). 

The total expected cost, given Y, which we shall denote by Cy, is 


Cy = Div {(a+ yB)k + (Y — y)}fuir + k(L — Divfoly)- (2) 
If 8=1/k, this reduces to 
Cy =k+ (Livfur){ ¥ — kl — a)}, (3) 
and if also Y <k(1—a) we have >., f,,y=1 and so 
Cy = Y + ka. (4) 


Now if no inspection were done at all, we should have Cy=Y, and if 100 
per cent inspection were done without any sampling plan we should have 
Cy=k. Thus the minimum unavoidable cost, caused by the presence of de- 
fectives in the lot, is Y if Y<k and k if Y>k. The expected excess cost above 
this unavoidable minimum (the “regret”), when a sampling plan is used having 
the expected cost Cy, is therefore 


ACy=Cy— min (Y, k). (5) 


We should like to minimize the expectation of ACy with respect to some dis- 
tribution of values of Y—a probability distribution, or a distribution of guessed 
relative frequency of occurrence in future lots. If such a distribution is com- 
pletely specified, we can (in principle) obtain the expected value of Cy, or of 





+I have retained my previous notation. Fortunately there are no errors of the first and second kinds in this 
problem. Duncan [24] uses fi and /; in place of a and 8. 





RECTIFYING INSPECTION OF LOTS 811 


ACy, as a function of a and 8 and determine where it is minimized. In practice, 
however, we shall probably hesitate over specifying the distribution precisely 
—and in any case the ensuing computation may seem excessively troublesome. 
Let us see what can be done if only rather broad qualitative statements are 
made about the distribution of Y. 

It is rather hard, though not impossible, to think of circumstances in which 
the distribution of Y has less spread than a Poisson distribution (or, if we do 
not assume N to be large compared with Y, less spread than a binomial dis- 
tribution). Most often it would be reasonable to suppose the distribution to be 
some average of Poisson distributions. Let us begin, therefore, by averaging 
Cy over a Poisson distribution for the values of Y, say with mean X. It is con- 
venient, to start afresh and find the chance f,;, that the sample path reaches any 
boundary point (a+y8, y), for random sampling of a lot such that Y is a ran- 
dom variable following a Poisson distribution with mean A. We obtain easily 


XY 
fula = a ox(ae + 8)! e~Mat 08), (6) 


The total expected cost, given X, is 


Cr = Divi(a+ yB)k + (1 — a — yB)A}fyin + kL — Devfun) 
k+(\—k) diy (1 — @— yB)fyn. 
We may now reckon the unavoidable minimum cost to be \ if \<k and k if 
\>k, and so we may define the regret to be 
AC, = Cy — min (A, k). (8) 
Evidently C,=k, AC,=0, if \=k. Let us now decide to choose 8 equal to 
1/k. It remains to choose a. Since C) is an average of values of Cy, we shall ex- 


pect, from (4), that if \ is sufficiently smaller than k for values of Y above k to 
be improbable, then 


(7) 


Cy, = \ + ka, AC, = ka. (9) 


It appears from numerical computation of (7) that AC) takes the constant value 
ka for \ small, decreases to 0 as \ approaches k, then rises to a maximum and 
falls towards 0 again as A increases above k. (See Fig. 1.) As @ is decreased, 
AC) is decreased for \<k but increased for \>k. It is necessary to strike a 
balance. The best value for a will depend on the relative weight to be attached 
to values of \ below k and above k. If no precise distribution for \ is specified, 
but it is considered that values of \ below k and above k are possible, a rea- 
sonable compromise would be to choose the minimax value for a, i.e. that 
value for a such that the maxima of AC, for \<k and for \>k are equal, thus 


SUP \>k AC, = ka. (10) 


For most likely distributions for A, this minimax a will err on the side of being 
too large, so perhaps it should be described as a suggested upper bound for a. 

It appears from computations that the minimax a is nearly a constant multi- 
ple of 1/./k—the multiple decreases slightly as k increases. It is shown in the 





AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 














1 L 1 1 1 1 


4 16 20 24 28 32 36 





Fia. 1. Expected loss or regret, AC), is plotted as ordinate against \ as abscissa for two 
linear sequential plans when k = 16: 

(i) a=0,0969, 8 =0.0625 (the minimax plan), 

(ii) a=0.0800, 8 = 0.0625. 


appendix that asymptotically for k large the minimax a is equal to 0.375/ Vk, 
very nearly. We thus obtain the following: 

Recommended procedure for rectifying inspection of a lot. Estimate the break- 
even number of defectives k, such that if Y=k no inspection and 100 per cent 
inspection are equally costly. If confidence is felt that Y will not exeed k ap- 
preciably, do no inspection at all. If confidence is felt that Y will not be ap- 
preciably less than k, inspect 100 per cent. If values of Y below k and above 
k are expected, take successive random samples, the first of size aN, all the 


others of size 8N, continuing until either the total number of defectives found is 
one less than the number of samples inspected vr the whole lot has been in- 
spected. The constants a and 8 are chosen as follows: 8=1/k, a=0.375/ Vk (or 
possibly a may be taken a little less than this). Thus the size of the first sample 
is 0.375./k times that of all subsequent samples (or possibly a little less), and 
the latter are of such a size that if Y =k the expected number of defectives per 
sample is equal to 1. 


4. DISCUSSION 


The above recommended procedure cannot, of course, claim to be the best 
possible. It is better than any single-sampling plan—considerably better if k 
is large. If a definite distribution for Y is assumed, the above plan will not (in 
gereral) be the best of all possible linear sequential plans, still less the best 
of a'l possible sequential plans. Especially if k is large, it seems likely that a 
sequential plan with a slightly curved boundary (convex from below) would be 
superior, under the cost assumptions that have been made. But the amount to 
be gained by refining the plan, in the light of a given distribution for Y, must 
be less than the above minimax regret of about 0.375+/k cost units, and this 
will usually appear rather small. I believe that some years ago Professor D. G. 
Champernowne determined the best sequential plan, given that Y followed a 
beta distribution with specified parameters, but he has not published this. 
Wurtele [27] investigated a curved-boundary plan for guaranteeing a lot 
tolerance, but that is rather a different matter. 

In the present problem, the minimax principle seems reasonable, as a method 





RECTIFYING INSPECTION OF LOTS 813 


of reaching a rule-of-thumb compromise. What is examined above is the 
minimax of AC), not of ACy. The two are not greatly different. Thus when 
k=10, the minimax AC), occurs for a=0.123, while the minimax ACy occurs 
for a=0.136, approximately. The minimax of AC, appears to me the more 
reasonable to consider. 

It may be noted, in connection with any sampling plan of this kind, that the 
successive samples do not have necessarily to be taken singly. If there are five 
defectives in the first sample, at least five more samples will be required, and 
they can be taken all at once. Then, if a further three defectives have been 
found, a further three samples can be taken at once; and so on. * 

We have supposed the cost coefficient k to be known. In practice, not only 
are we likely to hesitate over specifying a precise prior distribution for Y, but 
the value of k may be somewhat vague too. It is shown in the appendix that if 
the assumed value for k is inaccurate by some percentage the true regret func- 
tion reaches a maximum which is of the order of k, not /k, when k is large. 
This is another reason why it seems superfluous to refine the sampling plan 
any further. When large costs are involved, it would be worth while to devote 
some care to obtaining a fairly precise estimate of k. Without this, no plan, how- 
ever simple or complicated, can claim to be very economical. 


5. THE TESTING OF ELECTRIC METERS 


The above investigation was prompted by some private discussion concern- 
ing the testing of domestic electric (watthour) meters. Inspection of electric 
meters raises a subtle economic and legal problem, of a type that seems to be 
widespread in the retailing of goods and services. Some account will be given 
of it now, and the possible application of the sampling plan derived above will 
be indicated. 

The watthour meters supplied to households by a Public Utility Corporation 
are not infallible. After some years in service, meters may be found to be read- 
ing fast or slow, so that the customer is, as the case may be, overcharged or 
undercharged for the electricity he consumes. Some older types of meter suffer 
this damage because of current surges produced by lightning; and there are 
various other reasons why a meter may eventually go out of adjustment. If the 
meter runs slow, the Public Utility Corporation is the loser; if it runs fast, the 
consumer. To prevent injustice to the consumer and to protect the corporation, 
the state regulatory commissions usually require regular inspection of meters. 
A typical ruling would be that all meters must be tested once every eight years. 
But the newer types of meter have been found to maintain high accuracy for 
longer than eight years, and the corporations have claimed that 100 per cent 
testing of all meters indiscriminately every eight years is unduly wasteful. 
Recently some regulatory commissions have allowed sampling inspection, and 
others are considering the matter. To attain justice for the consumer in such 
schemes, it has been suggested the the commission should require that not more 
than some small percentage, say 3 per cent, of the meters of any particular 
type and age in service should run faster than some stipulated margin, say 
2 per cent fast. This looks like a lot tolerance, and proposals have been made to 
meet it by a rectifying inspection plan aimed at satisfying the lot tolerance 








814 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


with some high confidence coeflicient, such as 99 per cent, for minimum in- 
spection. (By a lot tolerance I mean a standard of barely acceptable quality of 
the lot after inspection. The confidence coefficient is the lower bound or infimum 
of the chance that the outgoing quality will be not worse than the lot tolerance, 
for all possible incoming qualities. The complement of the confidence co- 
efficient is known as the consumer’s risk.) 

This inspection situation has several unusual features. The total number of 
meters of given type and age in service in any one city or region can be very 
large; a figure of 10,000 has been mentioned as typical. This is much larger 
than the lots"usually thought about in connection with sampling inspection 
plans. Strictly random sampling presents no special difficulty, since consumers 
are listed on cards. The time required for 100 per cent inspection of a lot is 
many months, and the extra cost, due to complexity, in following an elaborate 
sequential sampling plan, rather than a very simple plan, is negligible. Finally, 
at first glance the lot tolerance concept seems to be appropriate. It seems 
plausible that a legal tolerance should be imposed on lot quality. 

However, when one asks, why should the lot tolerance be fixed at 3 per cent 
of meters fast, and why should the confidence coefficient be 99 per cent, it is 
hard to obtain an intelligible answer. And the same happens if, instead, the plan 
is defined in terms of any of the other usual concepts of sampling inspection, 
acceptable quality level or average outgoing quality limit or whatever. The 
quoted requirements are apparently arrived at by first deciding how much 
inspection is reasonable and convenient, and then determining how that much 
inspection can be called for by some prestigeful military or naval standard. 
What’s good enough for the Army, or Navy, is good enough for us. This may 
be sound politics, but it is poor science. 

Let us consider the economics of the problem. The Public Utility Corpora- 
tion is a regulated monopoly, and must work essentially on a “cost plus” basis. 
Any expense the corporation is put to in testing meters is an operating cost that 
will be admitted when charge rates are fixed. Of course, no business enterprise 
is static—the volume of business and the various costs change. The corpora- 
tion’s charge rates will not always be perfectly adjusted to the current operat- 
ing costs. But it must be roughly true to say that meter inspection is not a 
real financial liability to the corporation; it is paid for by the customer. So it is 
proper to ask: How much is it reasonable for a customer to pay to have his 
(and everyone else’s) meter inspected? 

The corporation’s receipts depend on the average speed of the meters, and 
because of the “cost plus” system of charging, it is unimportant whether the 
average speed is very close to the nominal speed or not—so long as changes in 
the speed distribution are slow. Thus by the accident of which meter he has, 
the customer partakes in a lottery in which he gains small sums (the under- 
charge in his bills) if his meter is slower than average and loses small sums 
(the overcharge in his bills) if his meter is faster than average. Provided the 
allocation of meters is purely haphazard, the lottery is “fair” in the sense that 
his expectation of gain is zero. The possible gains and losses are very small 
compared with the cost of living, and for such small sums the utility of money 
must surely be linear. It is hard to see why he should pay anything at all in 





RECTIFYING INSPECTION OF LOTS 815 


order to get out of this lottery. Inspection, if effective, will reduce the variance 
of meter speed, and so reduce the stakes in the lottery. Why should he pay to 
have them reduced? 

Consider a grocery product, such as sugar. It would be easy enough for the 
sugar supplier to guarantee that the average net weight of his 1 lb. packages 
of sugar, as they left the factory, was slightly in excess of 1 lb., and easy 
enough for an inspecting agency to check the supplier’s good faith. It is less 
easy for the supplier to secure a very small variance of weight and to guarantee 
individual weights. Is it reasonable for the customer to pay (say) a penny a 
pound more for sugar in order that the variance of weight should be very small? 
If the variance in weight was rather large, some slight inconvenience might 
result. It is convenient for the housewife to be able to use the contents of a 
package when the recipe calls for 1 lb. of sugar, without verifying the weight 
of the package. But apart from possible considerations of convenience, there 
seems to be no good reason at all to pay more for sugar in order to receive each 
time more exactly the nominal amount—provided that the allocation of 
packages to purchasers is purely haphazard. 

Fear that the allocation, of meters or of pounds of sugar, as the case may 
be, to the customer will not be purely haphazard is a possible good reason for 
paying to get out of the lottery. If the sugar packages vary enough in weight, 
the retailer (or his assistant) may be tempted to screen the packages and 
place the heavier ones in reserve, whence they will be brought out for a 
suitable consideration. The customer either pays the bribe or receives under- 
weight. Effectively, the price of sugar has gone up. And the mere existence of a 
system of bribery may be sufficiently distasteful to the customer that he would 
pay something to have it abolished. Corruption once started may grow more 
enterprising; the sugar bags may be tampered with. A similar racket could 
conceivably be run by the service men who install electric meters. To say this is 
not to suggest that any such racket is in fact run, nor that grocery retailers and 
electric service men are any less honest than other people. (I have no shred of 
evidence to suggest either.) But behind the popular acceptance of rigorous 
standardization and testing, which must be paid for, there is undoubtedly a 
dim awareness that without such standardization the man in the street will 
usually, somehow or another, get the worst end of the stick. 

Let us consider how much the effective charge rate for electricity would be 
increased if a well organized system of bribery were in force, governing the 
supply of slow meters. The simplest situation would occur if one half of the 
meters ran at one speed, the other half at another speed, say respectively at 
(100+) per cent and (100—e) per cent of the nominal rate. Let us suppose 
that the corruption is strictly honorable in its way, and is conducted as follows. 
When a meter is to be installed or changed, the service man brings a fast meter 
and also a slow meter with him, tells the customer truthfully what the difference 
in speed is, asks him to figure out what difference that would make to his bills, 
and then offers to let him have the slower meter on payment of one half (to split 
the profits) of the difference. If the customer accepts the offer, he will altogether 
pay as if his meter ran at the 100 per cent rate, while if he does not accept the 
offer his meter will run at the (100+ .) per cent rate. In the latter case, his 








816 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


total payments will not be as much as e per cent higher than in the former, 
because charges are not simply proportional to consumption. The increase 
would be more like (3/4)e per cent, perhaps. Provided that at least a half the 
customers would accept the slower meter on these terms, the result of dis- 
tributing all the meters would be that the corporation’s revenue would be 
correct according to the current rate of charge, but the customers, on the 
average, would pay more than this, approximately (3/8)e per cent more. 

Now the actual distribution of meter speed is not one of equal frequency on 
just two values. For any meter, let X denote the percentage difference in speed 
from the nominal. (We assume here, for simplicity, that each meter has a 
definite speed, although in fact the speed will depend to some extent on the 
load.) The frequency distribution for X over the meters of a given type and 
age is found usually to be centered close to 0 and to be bell-shaped with some 
negative skewness. Most meters run close to the nominal speed, a few are 
perceptibly fast, rather more are perceptibly slow, in proportions that depend 
greatly on the type. With such a distribution, the amount recoverable in 
bribes will depend on (a) what proportion of the customers are willing to par- 
ticipate and (b) the strategy of the service men; the result can vary within 
wide limits. The possibilities of (c) public indignation at the corruption and (d) 
a less honorable kind of corruption than that suggested above, are further 
factors to be considered. The consideration is certainly confusing, and one is 
tempted to dismiss the whole thing as a fantastic absurdity. Nevertheless, this 
is the nightmare that we pay to avoid when we pay for high standardization, 
and evaluation is not altogether foolish. If 48 (| X|) per cent is suggested as the 
effective increase in charges due to bribery, a good case can be made for saying 
that this is much too high, but not much of a case for its being too low. So let 
us suggest that 100 per cent inspection of meters, with rectification of those 
found to be perceptibly inaccurate, will be worth while and advisable whenever 
the percentage increase in charges due to the inspection is less than 48(| X| 3 

Let the average annual electricity bill per customer be $B, let the cost of 
testing a meter be $7’, and let a tested meter be judged to be dependable for the 
next M years, after which time retesting may be considered. Then our sug- 
gested rule is that 100 per cent testing of meters is advisable whenever 

200T 
(|X|) > ——.- 

MB 
Thus if the average annual bill were $100, the testing cost were $4, and the 
dependability were judged at eight years, the break-even value for &(| X|) 
would be 1. For any particular lot of meters in service, if the value of &( 2 
were known the appropriate decision could be made immediately, either to 
inspect 100 per cent or not to inspect at all. Usually the value of &(| X|) will 
have to be inferred by sampling inspection, and a suitable procedure would 
be as follows. 

Meters are inspected one by one at random, and the cumulative total of the 
observed | X|’s (the absolute percentage speed errors) is plotted against the 
number of meters tested. Inspection ceases if the sample path crosses a linear 





RECTIFYING INSPECTION OF LOTS 817 


boundary with slope equal to the above break-even value for &(|X|). The 
boundary is chosen to meet the horizontal axis at some suitable minimum value 
for the sample size. Discussion of the choice could go along the lines indicated 
elsewhere in this paper, though since the break-even slope has been determined 
only very roughly a summary decision concerning the minimum sample size 
would serve well enough. 

Instead of recording the individual |X ’s, it would be possible, and perhaps 
easier, to classify each meter merely as “defective,” meaning that |X | ex- 


ceeded some margin such as 2, or else “nondefective.” Suppose that “defective” 
meters are judged on the average to have | X| =4, and “nondefective” to have 
| X| =0.7. Then if P is the proportion of defective meters, we can reckon 


&(|X|) =4P + 0.7(1 — P) = 0.7 + 3.3P. 


If the break-even value for &(| X|) is 1, the break-even value for P is roughly 
0.1. The inspection plan of this paper may now be used. For N=10,000 and 
k=0.1N =1,000, the recommended plan would have an initial sample of size 
120 and successive samples of size 10. 

The foregoing calculations are based on the assessment of the possible effect 
of bribery as a percentage increase of 3&(| X| ) in total payments for electricity. 
As already indicated, the assessment is disputable on several grounds, and 
perhaps the ensuing calculations should not be taken too seriously. We shall 
be on much safer ground if we assert merely that the possible percentage in- 
crease in payments will be proportional to some linear measure of dispersion of 
the frequency distribution of X. It is also plausible to suggest that the prob- 
ability that bribery will actually occur, that the possibility will be realized, will 
be higher if the dispersion of X is higher and the possibility is more attractive 
to participants—whether or not we regard this probability as in any case very 
low. From these two remarks we can draw the following conclusions, which 
also summarize this discussion. 

(i) The principal reason for insisting on rigorous standardization and inspection 
(of electric meters or of weights of bags of sugar) is to prevent corrupt distribution. 
For this purpose, the dispersion of X should be kept small. All deviations of X from 
the norm are undesirable, whichever their sign. The existence of slow meters is as 
much contrary to the consumer’s interest as are fast meters (surprising though this 
may seem at first glance). 

(ii) A reasonable inspection policy for electric meters would be to make a sample 
inspection of every lot periodically to estimate &(|X|). The available 100 per cent 
inspection effort should then be directed against those lots for which &(| X|) seems 
to be highest. 

(iii) Talk about AQLs, AOQLs and lot tolerances confuses the real issue and is 
better avoided. 


Proposals along the lines of (ii) have been put forward by some corporations. 


6. ACKNOWLEDGMENTS 


I am indebted to Professor Acheson J. Duncan for introducing me to electric 
meters, and to Mr. Charles L. Matz for much interesting information on that 
topic. A referee and my colleagues at Chicago and Princeton have made many 








818 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 
helpful suggestions and comments. Especially I am grateful to Professor 
Leonard J. Savage for penetrating and stimulating discussion. 
APPENDIX 
Asymptotic results. Let us set 
a=a/Vk, B=1/k, A=k+tVk, y=z—avk, 
and consider what happens to the expression (7) for C, when ko, with a and 


t fixed (a>0). Let us choose a fixed positive ¢, such that a/+/e is large. Let the 
range of summation for y in (7) be split into the two parts, 


(i) O<y<[k—-—avk], (ii) [& -—avk]+1<y< [k— avi]. 


It is easy to show that >> f,;, can be made as small as we please by choosing ¢ 
small enough, and hence the contribution to (7) from range (i) is negligible. 
For the range (ii) it is straightforward to show, using Stirling’s formula, that 


J —3/2 ex onl on —1/2 
fun = a4/ 2 exp (—— — at - 1 +064 i2)}, 


— ~ 


Hence C) is approximated by 


enh f'0-Dy/Eeme( 2 
; — —)as/ —z 2 exp | —— 
ck k Qn ? 2k 


After the substitution r=a./k/z, this becomes 


sin ffi Shanti 
s+ 2t./k — — }) exp | —— — at — — )-—— - 
f a) oN 2 22?) \/2n 


Again, if ¢ is small, only a negligible difference is made if we replace the upper 
limit of integration by ©. We thus obtain a valid asymptotic expression for 
C,. Condition (10) becomes 


* a? l at\\?\ dz 
supoo2t f “(1 -Z)ew(->(2+ 5) age 


By numerical integration, this condition is found to be satisfied for a=0.375 
approximately, the value of ¢ at which the supremum occurs being 1.6 roughly. 
Values of AC, for a=0.375 and various ¢ are shown in Table 1. 

Tabulations. Also shown in Table 1 are values of AC, for the minimax plans 
when k=16 and 100—the plans, that is, having 8=1/k and a@ chosen to mini- 
mize the maximum of AC, (very nearly). It will be seen that the asymptotic 
results are a good approximation, except when \ exceeds k+2+/k. 

It is easy to modify the values of AC, computed for a given plan to obtain 
the values corresponding to the same plan but a different value of k. This is 
because k enters expression (7) in a very simple way. Suppose that k is believed 
to be 100, and therefore the plan a=0.038, 8=0.01 is installed. The losses 
occurring when k is in fact 80 or 125 are shown in Table 2. Now if the value of 
k is misjudged, the error is more likely to be due to misjudging the bad effect 





RECTIFYING INSPECTION OF LOTS 


TABLE 1. EXPECTED LOSSES FOR MINIMAX 
LINEAR SEQUENTIAL PLANS 








k=16 k =100 k very large 
a=0.0969, 8=0.0625 | a =0.0380, 8=0.0100 a =0.375/V/k, 





AC, r AC, 


or 


.80 


or 
onwq 


2.5Vk 
2.0Vk 
Vk 
Vk 
Vk 


Cc 


10 85 
12 90 
14 95 
16 100 
18 105 
20 2 110 
22 115 
(maximum 1.55) (maximum 
24 .55 120 
26 .50 125 
28 .40 130 
32 13 140 
36 .87 150 
40 .65 160 


Beene 


bo 


CH wwwwww 


ra 
S 


Vk 
OVk 
Vk 

(maximum 
OVk 
5Vk 
OVWk 
OVk 
Vk 
OVk 


~) 
~] 
y © bd 


© 


Ow w 


wo 





cooocooooocoococecco 
6 A . t > 2 2 





— e& DO 


.062\/k 








TABLE 2. EFFECT OF MISJUDGING k ON EXPECTED LOSSES 





k =80 k =125 
a =0.038, 8 =0.01 a =0.038, 8=0.01 


0.8AC) 











80 


57 
.06 
.74 
. 60 
55 
36 
7i 
32 
06 
.24 
42 
-70 
65 
51 


SCoOOSOCOWOUAMHOCOWDNAA Ua & 




















820 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


of passing a defective than to misjudging the cost of inspection proper. There- 
fore in Table 2 the cost unit has been so adjusted that 100 per cent inspection 
always costs 100 units, whatever the value of k; i.e. the value of AC, given by 
(8) has been multiplied by 100/k. 

For k large, the situation is as follows. If k is misjudged by a proportional 
error p and the supposed minimax plan is installed, the resulting loss (regret) 
will have its maximum for \ somewhere near the supposed value of k. At the 
supposed value the loss is roughly 4pk, which is of the order of k, not of +/k, if 
k is large. 

The minimax principle. From the tabulations we can judge the reasonable- 
ness of the minimax principle, as it has been applied here. A consideration of 
Fig. 1 should convince that the minimax principle is very reasonable when k 
is as low as 16. It is still reasonable when k= 106. But if k is several hundred 
(or higher) the principle is less satisfying. It seems to presuppose that we at- 
tach substantial probability to values of \ in the interval (k, k+5+/k), which 
is very short relative to the interval (0, k), against which, according to the 
minimax principle, it is weighed. Guthrie and Johns [7] have argued well the 
corresponding point for single sampling plans. 

I suspect that values of k much above 100 are rare in practice, and therefore 
little harm is done by recommending the minimax plan. For the hypothetical 
example of the electric meters, where k= 1000, the minimax plan suggested is 
not too sound from a coolly economic point of view, if we grant (what is not 
really the case) that k is known accurately—the first sample of size 120 is too 
large. But the minimax method of allowing for our uncertainty concerning the 
value of \ (or Y) is probably rather attractive from a legal point of view; 
it has a sort of impartiality, and the economic error that it commits will prob- 
ably be felt to be in the right direction (unnecessary inspection cost, charged in 
the first place to the corporation). So even here the minimax plan is perhaps 
the best to recommend. 

Comparison with single sampling. Consider the single sampling plan: inspect 
a random sample of size aN; if the number of defectives found does not exceed 
c, discontinue inspection; otherwise, inspect the rest of the lot. For this plan, 
the formula corresponding to (7) above is 


Cy =k + A — (1 — a) YS Qa) */y! (11) 


y-=0 


AC), has three maxima, one at \=0, the other two near \=k; for the minimax 
plan all three are equal, as nearly as the discreteness of c permits. 

For k= 100, we have seen that the minimax sequential plan, with a=0.0380 
and 8=0.01, has maximum loss equal to 3.80 approximately. The minimax 
single sampling plan is found to have a=0.0658 and c=6, and the maximum 
loss is 7.10, about 1.9 times that for the sequential plan. The graphs of AC, 
are compared in Fig. 2. 

Asymptotically for k large, the minimax single sampling plan is easily found 
to have 


a = 0.307k-"3, ¢ = [ak], 





RECTIFYING INSPECTION OF LOTS 

















I 1 = J 
20 40 80 120 140 160 180 200 


Fic. 2. Expected loss or regret, ACj, is plotted as ordinate against \ as abscissa for two 
plans when k = 100: 

(i) single sampling plan, a =0.0658, c=6, 

(ii) linear sequential plan, a =0.0380, 8 =0.0100. 


and the minimized maximum loss AC) is 0.307 k?/*, realized when A\=0 and 
k+1.4k?*, roughly. This loss differs from the minimax loss of 0.375k'” for the 
recommended linear sequential plan by the factor 0.82k*, which takes the 
value 2.5 when k=800, and 3.0 when k= 2400. 

It may be noted that the asymptotic results of Moriguti and Breakwell (see 
references in [1]) differ from the above in the numerical factors, being other- 
wise similar, because their cost function differs in an important detail from 
(11). See also Steinhaus [16] and Drobot and Warmus [6]. 

Summary of principal notation 

N =lot size. 

Y =number of defectives initially in the lot. 
=a Poisson expectation of Y. 

k=break-even value for Y or X. 

¢=proportion of the lot inspected at any stage. 
y =number of defectives found at any stage. 

a, 8=constants defining the sequential plan. 

ACy=regret for fixed Y. 

AC, =regret for fixed X. 

a, c=constants defining a sirgle-sampling plan. 

X =percentage speed error of an electric meter, or weight error of a bag 
of sugar. 
a is used in different senses in Section 2 and in the Appendix. 
e is used in different senses in Section 5 and in the Appendix. 


REFERENCES 


Papers on inspection are crudely classified according to major interest, as follows: 


P: process control. 
RC: rectifying inspection of a continuous output. 
NC: non-rectifying (acceptance) inspection of a continuous output. 
RL: rectifying inspection of lots. 
NL: non-rectifying (acceptance) inspection of lots. 








rs AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


(a) Supplementary bibliography of economic analysis of inspection 


{1] Anscombe, F. J., “Rectifying inspection of a continuous output,” Journal of the 
American Statistical Association, 53 (1958), 702-19. (RC) 

(2] Barnard, G. A., “Control charts and stochastic processes,” Journal of the Royal Sta- 
tistical Society, Series B, 21 (1959), 239-71. (P) 

[3] Cox, D. R., “Serial sampling acceptance schemes derived from Bayes’ theorem,” 
Technometrics, 2 (1960), 353-60. (NC) 

[4] Davies, O. L., “Some statistical aspects of the economics of analytical testing,” Tech- 
nometrics, 1 (1959), 49-61. (NL) 

[5] DeGroot, M. H., “Minimax sequential tests of some composite hypotheses,” Annals 
of Mathematical Statistics, 31 (1960), 1193-1200. (NL) 

[6] Drobot, 8. and Warmus, M., Dimensional Analysis in Sampling Inspection of Mer- 
chandise (Rozprawy Matematyczne, 5), Warszawa, 1954. (NL) 

[7] Guthrie, D. and Johns, M. V., “Bayes acceptance sampling procedures for large lots,” 
Annals of Mathematical Statistics, 30 (1959), 896-925. (NL, RL) 

[8] Hald, A., “The compound hypergeometric distribution and a system of single sam- 
pling inspection plans based on prior distributions and costs,” Technometrics, 2 
(1960), 275-340. (RL) 

(9] Hamaker, H. C., “Some basic principles of sampling inspection by attributes,” 
Applied Statistics, 7 (1958), 149-59. (NL, RL) 

{10} Hill, I. D., “The economic incentive provided by sampling inspection,” Applied Sta- 
tistics, 9 (1960), 69-81. (NL) 

[11] Lechner, J. A., “Optimum decision procedures for a Poisson process parameter,” 
Annals of Mathematical Statistics (to appear). (NL) 

[12] Robbins, H., “Sequential estimation of the mean of a normal population,” Probability 
and Statistics, the Harald Cramér Volume (ed. U. Grenander). New York: John Wiley 
& Sons, Inc., 1959, pp. 235-45. (NL) 

[13] Rusch, E., “Kostenvergleiche zwischen verschiedenen Revisionsverfahren der Tech- 
nik,” Zeitschrift fiir wirtschaftliche Fertigung, 53 (1958), 281-6, 311-4; 54 (1959), 
28-34, 127-32, 163-71, 208-16. (NL, RL) 

[14] Savage, I. R., “A production model and continuous sampling plan,” Journal of the 
American Statistical Association, 54 (1959), 231-47. (P) 

[15] Schlaifer, R., Probability and Statistics for Business Decisions. New York: McGraw- 
Hill Book Company, Inc., 1959. Chapters 22-24, 33-36. (RL, NL). 

[16] Steinhaus, H., “Statistical appraisal,” Colloquium Mathematicum, 2 (1951), 313-4. 
(NL)* 

[17] Suzuki, Y., “On sampling inspection plans,” Annals of the Institute of Statistical Mathe- 
matics, 11 (1959), 71-9. (NL) 

[17a] Taguti, G., “Sampling inspection plan by Bayes’ estimate,” Reports of Electrical 
Communication Laboratory, Nippon Telegraph and Telephone Public Corporation, 6 
(1958), 401-13. (NL) 

[18] Vagholkar, M. K., “The process curve and the equivalent mixed binomial with two 
components,” Journal of the Royal Statistical Society, Series B, 21 (1959), 63-6. (NL) 

[19] Vagholkar, M. K. and Wetherill, G. B., “The most economical binomial sequential 
probability ratio test,” Biometrika, 47 (1960), 103-9. (NL) 

[20] Waerden, B. L., van der, “Sampling inspection as a minimum loss problem,” Annals 
of Mathematical Statistics, 31 (1960), 369-84. (RL, NL) 

[21] Wetherill, G. B., “The most economical sequential sampling scheme for inspection 
by variables,” Journal of the Royal Statistical Society, Series B, 21 (1959), 400-8. 
(NL) 

[22] Wetherill, G. B., “Some remarks on the Bayesian solution of the single sample in- 
spection scheme,” Technometrics, 2 (1960), 341-52. (NL) 








* This summary in English of work published elsewhere in Polish establishes Steinhaus as one of the inde- 
pendent originators of the economic analysis of inspection. 











RECTIFYING INSPECTION OF LOTS 823 


(b) Other references 


[23] Anscombe, F. J., “Linear sequential rectifying inspection for controlling fraction de- 
fective,” Supplement to the Journal of the Royal Statistical Society, 8 (1946), 216-22. (RL) 

[24] Duncan, A. J., Quality Control and Industrial Statistics. Homewood, IIl.: Irwin, rev. 
ed., 1959. Chap. 16. (RL) 

[25] Hill, I. D., Horsnell, G., and Warner, B. T., “Deferred sentencing schemes,” Applied 
Statistics, 8 (1959), 76-91. (NC) 

[26] Plackett, R. L., “Boundaries of minimum size in binomial sampling,” Annals of 
Mathematical Statistics, 19 (1948), 575-80. 

[27] Wurtele, Z.S., “A rectifying inspection plan,” Journal of the Royal Statistical Society, 

Series B, 17 (1955), 124-7. (RL) 











RESIDENCE HISTORIES AND EXPOSURE RESIDENCES 
FOR THE UNITED STATES POPULATION 


Karu E. TaruBer* 
University of Chicago 


WitiiaAM HaENszEL 
Biometry Branch, National Cancer Institute 
AND 
Monroe G, SrrKEN 


National Vital Statistics Division 


Residence histories were collected in a supplement to the Current 
Population Survey. By restricting query to places rather than indi- 
vidual dwellings, complete histories were obtainable for ninety per cent 
of the sample. The concept of “exposure residence” is suggested as one 
technique for summarizing residence histories. Exposure residence data 
for the United States adult population are presented, and are used to 
support the hypothesis of stage patterning of rural-urban migration. 


ATA from the decennial censuses and the Current Population Survey per- 
D mit the study of net migration and interchanges and redistributions of 
population over a few fixed time intervals. None of these data permits the de- 
lineation of specific individual acts of migration, and none relates to each other 
the successive moves of individual migrants. Students of migration have long 
been aware of the need for longitudinal data, and Bogue [2, p. 31] has recently 
listed such information as part of “an ideal system of census statistics for 
measuring internal migration.” 

The perfect or ideal system of migration analysis would be achieved if, for each 
individual, the census enumerated a complete migration history, obtaining dates of 
arrival and departure from each community in which the person had lived. Such 


a task is obviously beyond the means of any census, and would produce a greater 
mass of data than could ever be used. 


It has proved feasible to collect modified migration histories for a national 
sample of the United States population. The first part of this paper describes 
the procedures that made possible the collection of these data. The second sec- 
tion describes a summary measure that was developed especially for the anal- 
ysis of residence histories in a national study of lung cancer mortality. The third 
section indicates how this summary measure may be given sociological mean- 
ing, and presents data illustrating its use in the study of migration. The con- 
cluding section discusses the findings, and suggests other ways in which resi- 
dence history data can contribute to the study of migration and population re- 
distribution. 


1. THE RESIDENCE HISTORY SCHEDULE 


The residence history data were collected for use in a lung cancer mortality 
study. The National Cancer Institute, in cooperation with the National Office 
of Vital Statistics and the Bureau of the Census, sponsored a Residence History 





* Formerly with the Biometry Branch of the National Cancer Institute. 


824 





EXPOSURE RESIDENCES FOR THE UNITED STATES 825 


and Smoking Habits Schedule as a supplement to the May, 1958, Current Pop- 
ulation Survey (CPS). The CPS is a national sample survey conducted by the 
Bureau of the Census. Completed interviews are obtained each month from 
about 35,000 households. Sample data are inflated to estimated national totals 
by the Bureau of the Census [14]. The residence history data were obtained 
for the civilian non-institutional population aged 18 and over, representing 
about 100,000,000 persons. 

Questioning of respondents began with place of current residence, and worked 
back to place of birth. After recording the name of the current place of resi- 
dence, the respondent was asked, “How many years have you lived continu- 
ously in ?” If the answer was “always,” the residence history was com- 
plete. Otherwise, the name of the preceding place was requested, along with 
length of continuous residence there. In this manner, up to four residences, 
including current place, were ascertained. At this point, if the residence history 
was not yet complete, the place of birth and the length of residence there were 
obtained.? Thus the residence histories include place of birth, place of current 
residence, and up to three places preceding the current residence, with duration 
of continuous residence at each of the five places. 

The residence histories are incomplete in three respects. First, persons with 
six or more residences were not queried on their residences between leaving the 
place of birth and entering the third place prior to the current place. These per- 
sons are separately classified in the following analysis as “frequent migrants.” 
Second, certain residences were deliberately omitted from the histories. With 
the exception of the places of birth and of current residence, a place was not 
recorded unless the duration of residence there was at least one year. Residences 
outside the United States were also excluded, except that place of birth could 
be listed as “abroad.” 

A third type of incompleteness in the histories arises because they were 
gathered with respect to places of residence, not with respect to all the dwelling 
units in which the person ever lived. A previous residence was not counted as a 
separate place unless it was in a different political unit—an incorporated city 
or county. Moves within a city were ignored, as were moves within the rural 
portions of a single county. In contrast to the usual Census practice with migra- 
tion tabulations, however, moves within a county but crossing city boundaries 
were recorded. Change of place, as thus defined, conforms as nearly as prac- 
ticable with the sociological definition of migration as involving a change of 
community [1]. 

The residence histories are relatively complete records of the inter-com- 
munity migrations of the respondents. Obviously, local moves within a place, 
especially in large cities, may sometimes involve a sociologically more signifi- 
cant change in community than is involved in some changes of place. Similarly, 
some temporary residences in another place or abroad may have great signifi- 
cance in the life history of the individual. With respect to general sociological 





1 Corresponding residence history data were collected for a national sample of decedents from lung cancer. 
The collection of residence histories for decedents has been described elsewhere [9]. 

2 Copies of the schedule may be obtained from the Biometry Branch, National Cancer Institute, Bethesda 14, 
Maryland. 








826 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


investigation, the histories delimit the major residential experience of most of 
those queried. 

For more than 90 per cent of the persons 18 and over responding to the resi- 
dence history questions, the reported places constitute a complete residence 
history. Of the total CPS population, 1.3 per cent have inadequately reported 
residence histories, and 8.8 per cent reported six or more residences. 

That a short residence history schedule can be sufficient for obtaining com- 
plete histories from such a high proportion of the population is surprising.* The 
omission of temporary and local moving permits this high degree of complete- 
ness. 

As in any survey of the past behavior of individuals, it may be assumed that 
some events which should have been reported were not completely and ac- 
curately recalled or known. However, the schedule considered each residence in 
sequence, and information was requested specifically about it. The accuracy of 
reporting should therefore be considerably higher than would be obtained from 
a single question on place of residence at some distant time in the past. 

The analysis of the residence histories in Part 3 of this paper focuses on dif- 
ferentials by age and size of place, rather than on absolute rates of movement. 
Two considerations support an assumption that the data are sufficiently ac- 
curate for this type of analysis. First, on the basis of their experience with 
enumeration and coding, the Bureau of the Census reports no evidence of un- 
usual difficulty with the schedule. Second, the data reveal definite and con- 
sistent patterns which conform in many instances to what is known from other 


sources. In a recent analysis of duration of residence at current place, based on 
these data, correspondences were noted with the CPS annual mobility data, as 
well as with the known patterning of migrations of Negroes to Northern metrop- 
olises [12]. 


2. THE CONCEPT OF “EXPOSURE RESIDENCES” 

Bogue’s dictum that complete residence histories include a greater mass of 
data than can be analyzed poses a serious problem. Portions of the data must be 
selected to suit the limited needs of any particular analysis. A previous paper 
utilized only the data on duration of residence at the current place [12]. This 
paper utilizes the concept of “exposure residences” for summarizing life-time 
residential experience. The concept is analogous to, and derived from, the con- 
cept of “exposure occupations” [3, 11]. 

An individual is assigned an exposure residence in a size-of-place interval if 
he has lived for ten or more years in places within that interval. The residence 
does not have to be continuous nor does it have to be in only one place. For this 
study, six size-of-place intervals are recognized: rural farm; rural nonfarm; 
2,500-9,999; 10,000-49,999; 50,000-499,999; and 500,000 and over.‘ 





4 The surprise of such a finding may reflect some facets of the sociology of demographic knowledge. Demogra- 
phers, as professionals, are members of the most mobile occupational group. They are also likely to be accustomed 
to the postwar phenomenon of approximately 20 per cent of the population changing residence each year, without 
sufficient awareness of the factors contributing to this high rate of mobility [15]. Because of local moving and 
repeat migration, a high degree of residential stability for most of the population is not at all contradictory with the 
annual mobility rates. 

4 Both the length of residence necessary to constitute an exposure residence and the size-of-place intervals 
can be varied in accord with the purposes of the study. Thus, in a Pennsylvania pre-test for the national lung cancer 





EXPOSURE RESIDENCES FOR THE UNITED STATES 827 


Some examples may clarify the definition of exposure residences. A person 
who has lived his entire life on a single farm is assigned to the category “one 
exposure residence, farm,” as is a person who has lived on several different 
farms, but never in any nonfarm places. Correspondingly, a person who has 
lived his entire life in New York City (and/or in Chicago) is assigned to the 
category “one exposure residence, 500,000+-.” A person aged 35 who lived on a 
farm for his first twenty years, and has lived in Topeka, Kansas, for the last 
fifteen years is classified as “two exposure residences, farm and 50,000—.” 

To avoid a very complex coding operation, all urban places were coded ac- 
cording tq their 1950 population size class. Thus some of the residential experi- 
ence here allocated to cities of a given size took place before the cities had at- 
tained that size. The further back the residential experience, the less accurate 
is the characterization of city size by 1950 data. The “old” (1940) definition of 
urban places was used, to conform to National Office of Vital Statistics pro- 
cedures for coding size of usual place of residence. In general, only incorporated 
places of over 2,500 population are recognized as urban. Unincorporated places 
and unincorporated portions of the urbanized areas delineated in 1950 are 
coded as rural nonfarm. Unfortunately, there is no way with these data to dis- 
tinguish residential exposure in suburban areas from exposure in non-metropoli- 
tan areas. 

The exposure residence concept is quite different from any direct migration 
measure. Consider a person with a complex residence history: he was born in a 
large city, moved to a farm at the age of 9, moved to a medium-sized city at the 
age of 14, moved to another farm at age 21, for three years, moved to a small 
town for nine years, and then moved to a farm, where he lived for three years 
before being enumerated. This person would be coded as having “one exposure 
residence, farm.” All residences of the same size are grouped, and the time se- 
quence of moves is lost. The example also illustrates that persons with diverse 
residence histories may be coded identically on exposure residences. 

The size of place in which a person lives is an index of many features of his 
social environment [4, 5, 6]. Many of the effects of this environment on the 
individual require time to become fully operative. The concept of exposure 
residences may be regarded as an indicator of the individual’s lifetime exposure 
to various social environments. Although the exposure residence concept has 
shortcomings as a measure of migration, it has a distinct advantage in its 
dependence on the entire residence history. The assumption that exposure resi- 
dences are a sociologically meaningful measure of migration patterns underlies 
the analysis in the following section. 


3. EXPOSURE RESIDENCES, MIGRATION, AND RESIDENTIAL STABILITY 


Of the United States civilian population aged 18 and over, 63 per cent are 
reported as having one exposure residence (Table 1).5 A single exposure resi- 
dence is not inconsistent with a large amount of residential mobility, both 





study [11], five or more years were deemed sufficient for an exposure residence, and ten size intervals were recog- 
nized. 

5 The data in this section are based on preliminary tabulations. The final tabulations differ only slightly. The 
percentages cited are based on national totals estimated from the CPS sample. 





AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


TABLE 1. NUMBER OF EXPOSURE RESIDENCES, BY AGE, 
UNITED STATES, 1958 











Age 


Number of Exposure 
Residences* 18-44 45-64 65 and over 








(Percentage Distribution) 
Total Reporting 100 100 
yy ¥ 62. 


1 
2 3. 15. 32. 


3 
4or5 
None 


3.5 


0. 
Pp 





Frequent Migrant> . . 10. 





® Number of the six size-of-place categories in which the person has accumulated at least 10 years total resi- 
dence. 
> Six or more residences in residence history; number of exposure residences not coded. 


within and between places. It is likely, however, that most of these persons 
do have long exposure in a single place.® 

Another 23 per cent of adults are reported as having two exposure residences. 
These data demonstrate a strong pattern of a particular type of residential sta- 
bility. More than 86 per cent of the adult population have lived for as much as 
ten years in only one or two size-of-place categories within the United States. 

Most of the remainder of the population are here classified as “frequent 
migrants,” that is, having six or more residences. Because there was room for 
only five residences on the schedules, the residence histories recorded for these 
persons are incomplete, and no exposure residences were coded for them. It is 
probable that some of these persons have moved mainly between places within 
one or two size classes, so that if their residence histories were completed and 
exposure residences coded, they too might have only one or two exposure 
residences. 

Few persons are reported as having three, four, or five exposure residences. 
Many of the persons who have moved often enough to have lived for more than 
ten years in places within each of several size intervals are likely to have moved 
at least five times, so that they are here classified as “frequent migrants.” 

The category “no exposure residences” includes two main types of residence 
histories. Young persons who have moved two or three times may not yet have 
had time to accumulate the ten years necessary for an exposure residence. Most 
of the persons in this category are foreign-born who, because their residential 
experience outside the United States is not counted in eomputing exposure 
residences, may similarly have had too little time to accumulate the necessary 
ten years.’ 





* This is indicated by other data from tabulations of the Residence History Supplement [12]. Of the adult 
population, 26 per cent report that they have always lived in their current place, and a total of 66 per cent report 
ten or more years in their current place. 

7 Tabulations by nativity support this interpretation of the “no exposure residences” category. However, some 
of the persons classified thus may have incompletely reported residence histories, whether because of frequent 
temporary residences, omitted by definition; long periods of residence abroad, also omitted by definition; or errors 
at some stage in the data collection and processing. 





EXPOSURE RESIDENCES FOR THE UNITED STATES - 829 


The exposure residence measure varies with the total length of time lived in 
all places, i.e., age. With increasing age, the proportions with two or three ex- 
posure residences increase, as does the proportion of frequent migrants. Never- 
theless, at each age, more than 80 per cent of the population remain concen- 
trated in the categories of one or two exposure residences. 

These patterns are affected by the differing proportions of foreign-born 
among the age groups, ranging from about 5 per cent among the youngest age 
group to 21 per cent among those 65 and over. The early residential experience 
of the foreign-born is omitted. Largely as an artifact of this coding procedure 
the younger foreign-born have a high proportion with no exposure residences, 
while the older are heavily concentrated in the one exposure category. Thus a 
pattern of increasing number of exposure residences with age among the native- 
born is partially obscured by the patterns among the foreign-born. 

The data suggest a concentration of migration in the young adult ages and 
increasing residential stability with age. However, the age-specific rates of mi- 
gration undoubtedly have been changing, so that the cross-sectional age pat- 
terns presented here are inadequate for inferring the age patterns for any indi- 
vidual cohort. Further examination of these patterns would require tabulations 
of age by exposure residences for each age cohort. 

For those persons with only one exposure residence, the distribution by size 
of place is given in the first column of Table 2. The second column gives the dis- 
tribution of the 1950 Census population by size of place. For rural nonfarm 
places, and for each of the two smaller city-size groups, larger proportions of 
the population reside there than have experienced single exposure residences 
there. This excess reflects the higher growth rate of these places in recent years. 
These places include some recent migrants, who have not yet accumulated ten 
years exposure. They also include persons who, while having accumulated an 
exposure residence in these sizes, had previously accumulated an exposure resi- 


TABLE 2. DISTRIBUTION OF POPULATION BY SIZE OF PLACE OF 
RESIDENCE, ALL AGES, 1950, AND BY SIZE OF PLACE OF 
SINGLE EXPOSURE RESIDENCE,* AGES 18 AND OVER, 
UNITED STATES, 1958 








Place of one Place of 
Size of place Exposure Residence Residence 
(1958 CPS) (1950 census) 





(Per cent distribution) 
Total, all sizes of place 
Total, one exposure residence 
500 ,000 and over 
50 ,000-499 , 999 
10 ,000- 49,999 
2,500—- 9,999 
Rural Nonfarm ; 25. 
Rural Farm 1 








® Sise of place in which the person has lived 10 or more years, for persons who have accumulated 10 or more 
years residence in only one of the six size-of-place categories. 





830 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


dence in other sizes, and who therefore are not shown in this distribution of 
those with one exposure residence. In contrast, many of the persons who have 
experienced an exposure residence in rural farm places, or in either of the two 
largest city-size groups, have moved to other sizes of place. Thus these places 
include larger proportions of the exposure residence experience than of the 
currently resident population. 

Twenty-three per cent of the adult population have two exposure residences. 
Their distribution among the fifteen combinations of the six size-of-place cate- 
gories is given in Table 3, above and to the right of the diagonal. Below and to 


TABLE 3. SIZES OF PLACES OF EXPOSURE RESIDENCES, PERSONS 
WITH TWO EXPOSURE RESIDENCES,* AGES 18 AND OVER 
UNITED STATES, 1958> 








Size of place 





Size of place : . 
500,000-  —50,000- 10,000- —s-2,500- Rural Rurel 
nonfarm farm 


500 ,000- 

50 ,000- 

10 ,000- 

2 ,500- 

Rural nonfarm 
Rural farm 














* Two different sizes of place in which the person has lived 10 or more years, for persons who have accumulated 
10 or more years residence in two of the six size categories. 
Above and to right of diagonal: Percentage distribution. Below and to left of diagonal: Deviation from 
expected percentage (see text). 


the left of the diagonal are given the deviations of the actual distribution from 
an expected distribution. This expected distribution was computed on the as- 
sumption that the relative frequency of each combination of two sizes is pro- 
portional to the product of the relative frequencies of each size among those 
with two exposure residences. Other expected distributions could be computed, 
e.g., using the size-of-place distribution from the 1950 Census, but the one used 
is adequate for taking rough account of the unequal distribution of exposures 
among the six size-of-place categories. 

Large deviations from the expected proportions occur for two combinations. 
Exposure residences in both farm places and places of 500,000+ occur much 
less often than expected, while the combination of exposures in farm and rural 
nonfarm places is considerably more frequent than expected. Most of the per- 
sons with exposure residences in farm places and another size category grew up 
on farms, and now live in the size of place of their second exposure.* Out- 
migrants from farms show a tendency to locate in other rural areas, or in small 
cities, and to avoid locating in large cities. A tendency for combinations of 
similar sizes of place to be over-represented is indicated also by the positive 
deviations for combinations among the three largest city sizes. 





* This statement is supported by unpublished tabulations permitting the order of residence to be inferred. 





EXPOSURE RESIDENCES FOR THE UNITED STATES 831 


The deviations for combinations including cities under 10,000 and rural non- 
farm places are less systematic, and probably reflect the operation of two main 
patterns. One pattern is that already discerned—the tendency for combina- 
tions of similarly sized places to be over-represented and of dissimilarly sized 
places to be under-represented. Interfering with this pattern is the suburbaniza- 
tion process, which tends to increase combinations of the larger city sizes with 
small cities or rural nonfarm places. A distinction between metropolitan and non- 
metropolitan places as well as specification of the order in which the residences 
occurred would be necessary for differentiating these two patterns. 

Only two per cent of adults have three exposure residences. Table 4 presents 
the distribution among the twenty possible combinations of those with three 


TABLE 4. SIZES OF PLACES OF EXPOSURE RESIDENCES, PERSONS 
WITH THREE EXPOSURE RESIDENCES,* AGES 18 AND OVER, 
UNITED STATES, 1958» 








etal Seteiieniiats Smallest size of place 


size of place size of place 





10,000- 2,500- Rural nonfarm Farm 





500 ,000- 50 ,000- 6.0 3.2 3.4 


1 
7) (—0.9) 


6 
(+1.3) (—0.5) (+0. 


10 ,000-— 3 : ; 
3) é (—1. 


3. 
(—0. 
2 ,500- 
Rural nonfarm 
10 ,000- 
2 ,500- 
Rural nonfarm 
10 ,000- 2 ,500-— 
Rural nonfarm 


(42. 


2,500- Rural nonfarm 6. 
(+1. 











® Three different sizes of place in which the person has lived 10 or more years, for persons who have accumu- 
lated 10 or more years residence in three of the six size categories. 
b Numbers in parentheses are deviations from expected percentages (see text). 





832 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


exposure residences. An expected distribution was computed by the same proce- 
dure as for combinations of two exposure residences. The deviations of the actual 
from the expected proportions are given in parentheses in the table. These fig- 
ures are all subject to relatively large sampling errors, but in broad pattern are 
plausible. 

Six combinations are over-represented. The three combinations with the 
largest positive deviations are combinations of a farm and a rural nonfarm 
exposure with each of the three smallest city sizes. The combination of the 
three largest city sizes is also over-represented. Combinations of dissimilarly 
sized places are under-represented, especially if one of the places is farm. Four 
of the five combinations with the largest negative deviations are accounted for 
by combinations of farm exposure and big city exposure with each of the other 
four sizes. 

The size distributions of two and three exposure residences are consistent 
with the generalization that migration is more commonly between places of 
similar size than between places of dissimilar size. This generalization makes 
plausible another feature of the data. Only one of every four persons has never 
migrated from his birthplace [12], but two of every three have accumulated 
ten or more years of residence within only one size of place. Many migrations 
therefore must occur between places within the same size grouping. 


4. DISCUSSION 


The Residence History Supplement has demonstrated that it is feasible to 
collect residence histories in routine surveys. With elimination of short-term 
residences and local moving within -places, complete migration and residence 
histories can be obtained for the great majority of persons with a schedule re- 
cording only five places of residence. 

An important but surprising result of the tabulation of exposure residences is 
the finding that most of the adult population has extensive residential experi- 
ence (ten or more years) in only one or two different sizes of place. This indi- 
cates a marked stability in the sizes of places in which people live, despite high 
rates of total residential mobility. 

Examination of the combinations of sizes of exposure residences for persons 
with two or three exposure residences disclosed a strong over-representation for 
combinations of similarly sized places. These data are consistent with a stage 
pattern of rural-urban migration. According to the hypothesis of a stage pat- 
tern, the redistribution of the population from farms to metropolises occurs as a 
series of moves from farm to village to town to city to metropolis [8, 10].® 
Excluding suburbanization, the net pattern of migration in recent decades has 
been from rural and smaller places to larger cities. Given this direction of move- 
ment, the finding of migration occurring predominantly between places of simi- 
lar size is consistent with a stage process. Exposure residences combining dis- 





® The national migration data for the United States have been poorly suited to analysis of stage patterning. 
Support for stage patterning was found in 1949-1950 data by Duncan and Reiss [5, p. 85] and in 1935-40 data by 
Thompson [13, ch. 6]. Lively and Taeuber [7, p. 98], using various data for the 1930's, found no support for stage 
patterning. The 1958 residence history data are the first national data which could be used for the direct study of 
stage patterning, but the necessary tabulations have not been made. The complexities of such tabulations are demon- 
strated in Wendel's analysis of Swedish data [16]. 





EXPOSURE RESIDENCES FOR THE UNITED STATES 833 


similar places would be evidence against the stage patterning of rural-urban 
migration, but these combinations are relatively infrequent. 

The exposure residence data lend support to a refinement of the stage pat- 
terning hypothesis: the full stage transition from farm to metropolis frequently 
requires more than one generation, and seldom occurs within the residence his- 
tories of individuals. This inference depends on two assumptions. If the full 
stage transition were accomplished as a gradual process during the lifetime of 
an individual, persons with three or more exposure residences should be more 
common, and the three exposures would more often combine dissimilar sizes. 
If the full transition were accomplished rapidly, either directly or with brief 
periods of residence in intermediate places, the two-exposure-residence combi- 
nations of “farm and 500,000+” and “farm and 50,000—” should be more fre- 
quent. The data are inconsistent with either of these two assumptions, but are 
consistent with the suggested refinement. 

The exposure residence data are inadequate for the unambiguous demonstra- 
tion of the validity of the stage pattern hypothesis. Sampling and reporting 
errors may account for some of the patterns. In any case, exposure residences 
do not indicate the sequence of moves. The direct study of stage migration 
would require different coding and tabulating of residence histories. 

The foregoing analysis has been based on simple tabulations of one summary 
measure of one set of residence history data. In conclusion, we emphasize the 
feasibility and the versatility of the residence history approach to the analysis of 
migration and population redistribution. The exposure residence concept is 
but one means of utilizing such data, and it ignores many of the unique features 
of residence histories. The information in residence histories can be coded and 
tabulated to obtain any desired data on the sequence of successive residences 
and migrations in the life histories of individuals.!° Residence histories do indeed 
provide a “greater mass of data than could ever be used.” This fact should serve 
not as a deterrent but as a challenging opportunity for research. 


REFERENCES 


[1] Bogue, D. J., “Internal Migration,” Ch. 21 in Hauser, P. M. and Duncan, O. D., eds., 
The Study of Population. Chicago: University of Chicago Press, 1959, p. 489. 

[2] Bogue, D. J., “The Use of Place-of-Birth and Duration-of-Residence Data for Study- 
ing Internal Migration,” United Nations Economic and Social Council, United 
Nations Seminar on Evaluation and Utilization of Population Census Data in Latin 
America (E/CN. 9/CONF.1/L.10), Santiago, Chile (1959). 

Buechley, R., Dunn, J. E., Linden, G., and Breslow, L., “Death Certificate State- 
ments of Occupation: Its Usefulness in Comparing Mortalities,” Public Health Re- 
ports, 71 (1956), 1105-11. 

[4] Duncan, O. D., “Optimum Size of Cities,” in Spengler, J. J. and Duncan, O. D., eds., 
Demographic Analysis. Glencoe, Illinois: The Free Press, 1960, 372-85. 

[5] Duncan, O. D. and Reiss, A. J., Social Characteristics of Urban and Rural Communi- 
ties, 1950. New York: John Wiley and Sons, Inc., 1956. 

[6] Halbwachs, M., Population and Society. Glencoe, Illinois: The Free Press, 1960. 

[7] Lively, C. E. and Taeuber, C., Rural Migration in the United States. Washington: 
Government Printing Office, 1939. 





10 We plan additional tabulations of the 1958 residence histories to permit further exploitation of these data 
and exploration of alternative techniques of analysis. 








834 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


[8] Moore, J., Cityward Migration. Chicago: University of Chicago Press, 1938. 
(9] Pifer, J., “A Methodological Report of the National Lung Cancer Mortality Study,” 

Vital Statistics Special Reports, forthcoming. 

[10] Ravenstein, E. G., “The Laws of Migration,” Journal of the Royal Statistical Society, 
48 (1885), 167-235, and 52 (1889), 241-305. 

[11] Sirken, M. G., Haenszel, W., and Pifer, J., “Residence Histories of Deceased Persons,” 
Milbank Memorial Fund Quarterly, 38 (1960), 5-22. 
Taeuber, K. E., “Duration-of-Residence Analysis of Internal Migration in the United 
States,” Milbank Memorial Fund Quarterly, 39 (1961), 116-31. 
Thompson, W.8., Migration within Ohio, 1935-40. Oxford, Ohio: Scripps Foundation 
for Research in Population Problems, Miami University, 1951. 
United States Bureau of the Cerisus, “Concepts and Methods Used in the Current 
Employment and Unemployment Statistics Prepared by the Bureau of the Census,” 
Current Population Reports, Series P-23, No. 5, 1958. 
United States Bureau of the Census, “Mobility of the Population of the United States 
March 1957 to 1958,” Current Population Reports, Series P-20, No. 85, 1958. 
Wendel, B., “A Migration Schema, Theories and Observations,” Lund Studies in 
Geography, Series B: Human Geography, No. 9 (1953). 





A SIMPLE THEORETICAL APPROACH TO 
CUMULATIVE SUM CONTROL CHARTS 


N. L. JoHNsSON 
University College, London and Case Institute of Technology 


By considering the use of a cumulative sum control chart, in the 
way described by Page [3] and Barnard [2], as the application of two 
sequential probability ratio tests to the observed series taken in re- 
versed order, some approximate formulae for properties of this pro- 
cedure are obtained. These are simple formulae relating the probabili- 
ties of detection of a given change in average value to the position and 
slope of the critical limits. It is suggested that these results may some- 
times be useful as a guide to the limits to be used. 


1. INTRODUCTION 


WHE use of cumulative sum control charts, as an alternative to standard 
p ipeainse charts for the mean of a normal population, has been described by 
Page [3, 4]. Barnard [2] has given a general description of the present posi- 
tion, in regard to their construction and use. Here, the construction and use of 
these charts will be described only in just sufficient detail for the reader to un- 
derstand the basis and purpose of the subsequent analysis. 

Cumulative sum control charts are intended to replace the standard form of 
control chart for controlling the mean of a normal population. Standard charts 
usually consist of a plot of observed sample means in sequence, against a back- 
ground of horizontal lines representing the “target” or “specification” mean, 
and inner (and/or outer) control limits at distances approximately twice 
(and/or three times) the (supposedly known) true standard deviation of the 
plotted sample means above and below the “target” mean. “Lack of control” 
is then indicated by the occurrence of observations outside the outer control 
limits, or by two successive observations outside the same inner control limit, 
or by some similar rule or combination of rules. 

In cumulative sum control charts, on the other hand, cumulative totals are 
plotted against the number of observations. If #; denote the mean of the jth 
sample, and ¢ the (known) standard deviation of Z; (= (population standard 
deviation) X (sample size)—/*) then it is convenient to consider the points on 
the control chart as having co-ordinates (m, X,,), where 


Xn =o [(@1 — w) + (@ — uw) + +++ + Gm — W)] 


and yu is the “target” mean. (In practice this can be effected by calculating the 
cumulative sum 


and using an appropriate vertical scale.) 
The charts are interpreted by placing a mask (shaded in Figure 1) over the 
chart, with the point O over the last plotted point of the chart and OP hori- 


835 








AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


Mask for Cumulative Sum Control] Chart 











Fia. 


zontal. “Lack of control” is diagnosed if any points of the cumulative sum 
control chart are covered by the mask, i.e., if any points lie below the straight 
line A,B, or above the straight line A_,B_:. 

The dimensions of the mask are defined by the angle, 20, between A_,B_i 
and A;B,, and the distance, d, between O, the mid-point of B_,B,, and P, the 
point of intersection of A_,B_, and A,J,. It appears to be the common practice 
(see, e.g., Barnard [2, p. 242]) to determine d and @ empirically. The aim is to 
make it unlikely that “lack of control” will be indicated when a process is in 
control, or that a marked change in population mean will remain undetected for 
long. 

It is the purpose of this note to indicate how the standard theory of sequential 
probability ratio tests can be applied to provide a simple (though non-precise) 
background when choosing appropriate values of @ and d. 


2. A SEQUENTIAL ANALOGY 


The essential feature of the approach to be described here is that the ap- 
plication of the mask to the control chart is regarded as (roughly) equivalent 
to the application of a sequential probability ratio test “in reverse.” When O 
is placed at the point (m, X,,) the lines A_,B_,; and A,B, may be regarded as 
the lower and upper outer limits of a chart of the type used by Armitage [1] 
for distinguishing between the three hypotheses: 

H_,; (mean =—5), Hy (mean=0), H; (mean=4) with symmetrical error 
probabilities, applied to the sequence of observations 


es a "(Lm = H), oe a "(Sm—1 = K), and so on. 


The likelihood ratio of H; to Ho, based on y1, yo, - + + , Ye is 





CUMULATIVE SUM CONTROL CHARTS 


= > 
P(ys, Yo, °° * Yel Ha) _ “=< 


P(Y1, Yo, °° -yn| Hy) 





; 3 
(2ar)-1/2* exp |-5 > vi 


4 jul 


1 . 
= exp E {05 z Yj — math |. 


S\ fal 





Hence, the upper limit (above which H, is rejected) of the sequential prob- 
ability ratio test comparing H» and H,, with approximate error probabilities 
a and a, where 


Pr{reject Ho| Ho} = ao, Pri accept Ho| Hi} = a 
is defined by 


= 


1 l—a 
- 425 yYu- kak = log, = 


2 j=l ao 


iS 
> yi = 5 log. 


j=l ay 


“nee: 
= +> tk. (1) 


Similarly the lower limit (below which Hp is rejected) of the sequential 
probability ratio test comparing Hy and H_1, with the same approximate error 
probabilities ap and a, is 

k 1 —- @s 1 
D yi = — 5" log. — — 6k. (2) 
j=l ao 2 

The two tests, combined according to the scheme proposed by Armitage 
[1], can be represented as in Figure 2. The probability of a false indication of 
lack of control, when the process really is in control, is then 2ao, approximately. 
The mask in Figure 1 can be obtained by reflecting the appropriate parts of 
Figure 2 through the point O. 


3. THE CONSTRUCTION OF THE MASK 
We see that the dimensions, d and 0, of the mask are defined by 
tan @ = 36 (3) 
1 — a 
d = 26-* log, ——- (4) 
ag 
a; is usually rather small, so log,(1 — a) =0 and 


d = — 25-* log, ao. (5) 


Formulae (3) and (5) are very simple, and values of @ and d for any par- 
ticular case can readily be obtained. In order to appreciate the form of the 








AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


Sequential Probability Ratio 
Test Diagram 
> 4; t ACCEPT I ~~ A1 


bs AE Ho 





ACCEPT H- 
(u — dc) 





Fie, 2 


relationship between the parameters 6 and ap, and the values of @ and d, we 
note (a) that, whatever be the value of a, @ is determined by 6 so that, for 
example 
when 6=0.5, 0.75, 1.0, 1.25, 1.5 
@=14.0°, 20.6°, 26.6°, 32.0°, 36.9° 
and (b) values of d are given in Table 1 (lower rows of figures). 

These results suggest the following rules for deciding on the dimensions of 
TABLE 1. AVERAGE SAMPLE NUMBERS TO DETECT DIFFERENCE 
OF éiz2 (STANDARD DEVIATION OF SAMPLE MEAN) BETWEEN 
SPECIFICATION AND TRUE MEAN* 








Combined 
0.01 .00% ‘ Rulet 








29.50 2.75 f 110.55 
36.84 2.39 
17.40 29.46 ; 51.67 
.12 16.37 .84 
.93 10.83 .38 
.38 9.21 .60 
2. 4.02 7.10 .82 30.42 
3.83 4.72 5.89 3.78 8.84 
2.45 3.10 4.89 .09 17.89 
2.66 3.28 4.09 oun 6.14 























* Upper figure relates to standard control chart (single upper limit). ¢ Lower figure relates to cumulative sum 
control chart (and is equal to d in (5)). 

t One point above upper 0.1% limit, or two successive points above upper 2}% limit. 

t This was calculated from the formula 


1 oo —1 1 7 
[ — f min] » where -—— edu = ao). 
V20 Jad V2e ag 





CUMULATIVE SUM CONTROL CHARTS 839 


the first mask to be used. (It is, of course, possible that a certain amount of 
trial and error may lead to some subsequent improvement.) 


(i) Decide on the least size of (absolute) change in the mean, D, say, which 
it is desired to detect rapidly and with fair certainty; let 5=0-!X D. 
(ii) Decide on the approximate greatest tolerable probability, 2a, of false 
indication of lack of control. 
(iii) Then take 6=tan-'(36) 


d= —25-log.ap. 


[Note that the angle @ depends only on 4, not on ap. | 


4. INTERPRETATION 


The analogy with a reversed sequential test procedure may also contribute 
something to the interpretation of a cumulative sum control chart. The es- 
sential difference between the two procedures is that the use of the mask with 
the cumulative sum control chart makes no provision for definite acceptance 
of the hypothesis Hp (i.e., no lack of control). If the series of observations ex- 
tends far enough back, it is possible that a “lack of control” (i.e., a crossing of 
A,B, or A_,B_;) will be signalled eventually, even though the corresponding 
sequential procedure would have terminated previously, with acceptance of 
H,. It is suggested, therefore, that “lack of control,” first signalled at a point a 
long time before the test point 0, be regarded with some doubt. If H, (or 
H_,) be true the average sample number is approximately 


—25-? log, ao(=d). (6) 


If the point at which “lack of control” is first signalled is at a distance of, say, 
3d or more before O, it would appear to be advisable to doubt this indication. 

At the expense of a slightly more complicated procedure, e.g., using a trans- 
parent sheet with appropriate limits (including the “accept H,” limits) for the 
sequential test procedure marked on it, it would be possible to allow for direct 
acceptance of Hp (i.e., “no lack of control”) instead of looking only for evidence 
of lack of control. 


5. COMPARISON WITH STANDARD METHODS 


It is not easy to compare the operating properties of the standard and 
cumulative sum control charts directly. Table 1 contains some calculations 
which may help in assessing their relative performances. The upper figure in 
each cell is the expected number of observations before obtaining a single ob- 
servation above the upper limit, corresponding to the appropriate value of 
a. The lower figure is the (approximate) average sample number of the se- 
quential (cumulative sum) test procedure constructed with the given values 
of 6 and a (given by (6)). Both figures are calculated on the assumption that 
there is a change of dc in the true population mean. The comparison may be 
unduly favorable to the sequential procedure, since this is known to be “op- 
timal” for this value of difference between the mean, but the comparison can 
be extended to other sequential procedures, if desired, and the general nature 
of the comparison will not be drastically altered. The comparison is also not 





840 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


direct in that 2a is a probability of error for a single observation in the standard 
case, and for a complete procedure in the cumulative sum case. 

The final column of the table gives the expected number of observations be- 
fore an indication of an increase in mean in the standard case where either an 
observation above u+3.09¢ or two successive observations above 4+1.960 is 
regarded as such an indication. The formula used here is 


l 3.09—8 ; 1 ) 
E +— wau]| . f € iw"du 
V 2m J 1.96~3 V 29d 3.09-3 
1 3.09—-6 1 es) —1 
+ ( —f eau) x (=; f tu) | ; 
V2n J 1.96~3 V2r J 1.96~3 


Bearing in mind the imperfections of the figures in the table as a basis of 
comparison it does, however, seem possible to infer that the cumulative sum 
chart will give noticeably more rapid indication of a change in mean if a small 
enough value of a is being used. Since a value of about 0.001 is customary in 
control chart work, most empirical investigations have indicated a marked 
advantage for cumulative sum control charts. The advantage decreases sharply 
as a increases, however. A further point of interest is that the advantage of 
the cumulative sum chart appears to be more firmly established (i.e., over a 
wider range of a») for larger values of 6, the (standardized) change in mean 
value. 


6. CONCLUSION 


A further paper is in preparation, in collaboration with F. C. Leone, which 
will discuss some of the more practical aspects of the applications of some of the 
results of the present paper in control chart work. In particular, it is hoped to 
discuss ways of facilitating choice of appropriate limits and, in case it may on 
occasion be useful, direct acceptance of the existence of a state of statistical 
control, as described in section 4. Some industrial examples will also be in- 
cluded. 

It is also hoped to discuss the extension of this method to cases where the 
mean varies other than by discrete jumps. 


REFERENCES 


{1] Armitage, P., “Sequential analysis with more than two alternative hypotheses and its 
relation to discriminant analysis,” Journal of the Royal Statistical Society, Series B, 
12 (1950) 137-44. 

[2] Barnard, G. A., “Control charts and stochastic processes,” Journal of the Royal Sta- 
tistical Society, Series B, 21, (1959) 239-57 (Discussion pp. 257-71) 

[3] Page, E. 8., “Continuous inspection schemes,” Biometrika, 41 (1954) 100-14. 

[4] Page, E. 8., “On problems in which a change in a parameter occurs at an unknown 
point,” Biometrika, 44, (1957) 248-52. 





STATISTICAL METHODS FOR THE MOVER-STAYER MODEL* 


Leo A. GoopMAN 
University of Chicago 


. Introduction and Summary 
2. The Mover-Stayer Model........ 
3. Indirect Methods of Estimation............. 
3.1 The Estimation of M*. 
3.2 The Estimation of S 
3.3 The Estimation of M 
4. More Direct Methods of Estimation 
4.1 Estimation Methods that Use Information Concerning the f{” 
4.1.1 The estimation of M.............. 
4.1.2 The estimation of S 
4.1.3 The estimation of M* 
4.2 Estimation Methods that Use Additional Information 
. Comparisons of the Indirect and the More Direct Methods of Estimation..... . 
3. The Testing of Hypotheses................. 
. Some Supplementary Remarks........ 
. References... 


The mover-stayer model, a generalization of the Markov chain model, 
assumes that there are two types of individuals in the population under 
consideration: (a) the “stayer,” who with probability one remains in 
the same category during the entire period of study; (b) the “mover,” 
whose changes in category over time can be described by a Markov 
chain with constant transition probability matrix. The transition 
probability matrix for movers, and the proportion of stayers among the 
individuals in each category at, say, the initial point in time, are un- 
known parameters. Various estimators of these parameters are pre- 
sented herein, and the accuracy of these estimators is compared. We 
show, for example, that the estimators of these parameters used by 
Blumen, Kogan, and McCarthy [3] are not consistent estimators, while 
the estimators recommended herein are. In additior, tests of several 
hypotheses concerning the mover-stayer model are presented herein. 
The methods developed in this paper can be applied to the study of 
various phenomena where panel data are available. 


1. INTRODUCTION AND SUMMARY 


HE mover-stayer model was first introduced by I. Blumen, M. Kogan and 

P. J. McCarthy (referred to as BKM herein) in their interesting study [3] 
of the movement of workers among various industrial aggregates in the U. 8S. 
This model assumes that each worker is either a “mover” or a “stayer,” that 
stayers do not move, and that the movement of movers can be described by a 
particular kind of probability process, a Markov chain. (Discussion of this prob- 
ability process appears in Section 2 herein.) In the present paper we shall study 
the problems of estimating the parameters in this model and of testing various 
hypotheses concerning it. 





* Part of this research was carried out at the Department of Statistics, University of Chicago, under sponsor- 
ship of the Logistics and Mathematical Statistics Branch, Office of Naval Research; and part at Columbia Univer- 
sity, under sponsorship of the Office of Naval Research (Contract Number Nonr-266(33), Project Number NR 
042-034), while the author was a Visiting Professor of Mathematical Statistics and Sociology there. Reproduction 
in whole or in part is permitted for any purpose of the United States Government. Some of this work was first 
presented at a joint session at the annual meetings of the American Statistical Association and the Institute of 
Mathematical Statistics, 30 December, 1955. 


841 





842 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


Several modifications of the estimation method used by BKM [3] are sug- 
gested in the present paper, and alternative (more direct) estimation methods 
are also developed here. The accuracy of these estimation methods is studied 
herein. We find, for example, that the estimators actually used by BKM were 
not consistent estimators; the estimators recommended herein are. Reanalysis 
of some of the data presented by BKM, using methods suggested herein, indi- 
cates that the earlier estimation methods led to certain kinds of systematic 
biases; e.g., the estimates given by BKM of the fraction of stayers among the 
workers studied were usually too large. 

Justification is presented in the present paper for certain significance tests of 
the kind applied by BKM, and it is also shown that the usual interpretation of 
certain other standard tests presented by BKM cannot be justified when ap- 
plied to the mover-stayer model. Tests of significance are developed herein spe- 
cifically for the mover-stayer model; e.g., we present a method for testing the 
null hypothesis that all the workers studied are movers. Some of the tests pre- 
sented here are related to, but different from, the tests that have already been 
suggested for the analysis of Markov chains (see, e.g., [1], [2], [9]). 

The actual observations used by BKM [3] were mut d on data derived from 
the Continuous Work History Sample (CWHS) of the Bureau of Old-Age and 
Survivors Insurance (BOASI) (see [13]). The data derived from the CWHS are, 
in a sense, panel data for a sample of individuals whose work histories appear 
in the BOASI files. The methods presented in the present paper and those pre- 
sented by BKM for treating these data may be useful for analyzing other kinds 
of panel studies and also for studying other kinds of phenomena (see, e.g., re- 
lated comments in [7]). 

The research results presented by BKM constitute an important example of 
the use of probability processes in the study of social phenomena. The general 
point of view and the substantive contribution of the work of BKM warrant 
its careful perusal. Reviews of this work [3] have been written by W. Feller 
[7], R. R. Bush and B. P. Cohen [4], E. Kitagawa [10], and T. A. Mahoney 
[12]. For further discussion of this work, the reader is referred to these reviews. 


2. THE MOVER-STAYER MODEL 


The mover-stayer model can be described as follows: Industries are grouped 
into a finite number, J, of industrial code categories. In the ith code category 
(¢=1, 2,--.-, J), there are two kinds of workers, the stayers and the movers. 
Let s; denote the proportion of workers in the ith code category in the initial 
quarter who are stayers ({=1, 2,---, J). Then 1—s; is the proportion of 
workers in the ith category in the initial quarter who are movers. It is assumed 
that each stayer remains in a particular code category with probability one, 
and that each mover changes his code category over time in a way that can be 
described by the one-quarter transition probability matrix 


+m || 





THE MOVER-STAYER MODEL 843 


where m,; is the probability that a mover who is in the ith code category in a 
particular quarter will be in the jth category (7, 7=1, 2, - - - , J) in the following 
quarter. (For purposes of analysis, a worker who was employed in two or more 
industrial code categories in a particular quarter was assigned for that quarter 
by BKM [3] to the industry from which he obtained the largest share of wages.) 
In the language of probability theory, it is assumed that the behavior of movers 
can be described by a Markov chain with a constant transition probability 
matrix, and that the behavior of the stayers can be described by a transition 
probability matrix with 1’s in all the diagonal cells (where i=j). (For further 
discussion of transition probability matrices, see [1], [2], [6], [9].) 

The transition probability matrix, M, describing the behavior of the movers, 
and the proportions, s;, of workers in each code category in the initial quarter 
who are stayers ({=1, 2,---, J), are unknown parameters. Let S=(s, 
8, +--+, Sr). For the sake of simplicity, we assume throughout that m;;>0 
and 1>s8,;20, for 7, j7=1, 2,---+, J. These parameters can not be estimated 
directly from the data used by BKM since there is no way of determining un- 
equivocally which workers in a given code category in a specified quarter are 
movers and which are stayers. (A worker who remains in the ith code category 
between two consecutive quarters may be a stayer who does so with probability 
one, or a mover who does so with probability m,,.) It is therefore necessary to 
devise indirect methods for estimating these quantities, and a procedure for do- 
ing this is given by BKM. In Section 3 herein we describe a somewhat simplified 
version of this procedure and we suggest some modifications of it, in Section 4 


_ we present somewhat more direct methods of estimation, and in Section 5 we 
compare these methods. The estimators recommended in Section 4 have smaller 
asymptotic variances than the corresponding estimators in Section 3. 


3. INDIRECT METHODS OF ESTIMATION 
3.1 The Xstimation of M* 


For present purposes, we assume that the transition probability matrix, M, 
describing the behavior of the movers is constant. (We shall discuss in Section 
6 the problem of testing whether M is, in fact, constant; no doubt with chang- 
ing economic conditions, M@ would actually not remain constant over a long 
time period.) From the theory of Markov chains (see, e.g., [6]), we know that 
the nth power of M(i.e., ") is the n-quarter transition probabiiity matrix for 
movers; i.€., 


(n) (n) 
Mie mir 


where m\? is the probability that a mover who is in the ith code category in the 


initial quarter will be in the jth category in the (n+1)th quarter. BKM assume 








844 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


that, as n approaches infinity, the limit of /” exists, and they present a method 
for estimating this limit, @*. The estimates of M and S presented by BKM 
cannot be computed until this estimate of M*” has been computed; i.e., the esti- 
mators of M and S given by BKM are functions of the estimator of M*. Thus, 
the first and, in a sense, basic step in the estimation procedure of BKM is the 
estimation of M~*. It is, therefore, of some value to develop as accurate an 
estimator of /* as possible. We shall now present a somewhat simplified version 
of the BKM estimator of M*, and shall suggest modifications that increase 
the accuracy of this estimator. 
The matrix M® is of the form 


a 

where 

m; = lim mi, 

ae 
fori=1, 2,---, JZ. Thus, m; is the approximate probability that a mover will 
be in the jth code category in the (n+1)th quarter, and this probability does 
not depend on the category in which the mover was in the initial quarter 
(n—+©). This process is observed in the initial quarter and in the (n+1)th 
quarter, where n is sufficiently large so that M" approximates M~@. (A method 
for testing whether M* approximates M* will be presented in Section 6 herein.) 
A cross-classification table of the following kind is obtained: 


|| —— 


where z,; is the number of workers who are in the ith code category in the 
initial quarter and in the jth category (j 7) in the (n+1)th quarter. The diag- 
onal entries (when t=7) are not given in this table since we wish to estimate the 
matrix for movers, and we do not know how many movers are included among 
those workers who were in the same category in the initial quarter and in the 
(n+1)th quarter. (The table entries used by BKM [3] were 


£5 = 7 ris(t)/V, 


t=1 


where 2z,;(t) is the number of workers who are in the ith code category in the 
tth quarter and in the jth category (j #7) in the (n+1?)th quarter (t=1, 2, -- -, 
V). For V=1, #;;=2,;. For the sake of simplicity, we take V = 1 in the present 
section. The case where V >1 will be discussed in Section 4.2.) Let 





THE MOVER-STAYER MODEL 


x. = 7 Lij 


i*% 


denote the total number of workers appearing in the table code category 7 in 
the initial quarter, let 


te= Do xx 


I*% 


denote the total number of workers appearing in the table code category 7 in 
the (n+1)th quarter, and let 


Z..= D bt 


t ja 


denote the total number of workers appearing in the table. The maximum 
likelihood estimate of M® (i.e., mm, mz, - + -, mr) based upon the data in this 
table (when I >2) is given as the solution of the system of equations 


1 
> m; I, 
j=l 


Am; +m(—-A\— a. + 24.) +24 =90, 


(3.1) 


for i=1, 2,- +--+, I where mi, me, ms, - - -, mr and X are unknown (see [3, p. 
46|). BKM suggest that a good approximation to the solution of these equa- 
tions is given by taking mf =z_,/x.., and they claim that the accuracy of this 
approximation depends on (x,;—2;.)/z,, being small fori=1,2,---, J. 

The expected value of x_,; is 


(n) 
Le ys: , 


31” 


where y; is the number of movers in the jth code category in the initial quarter. 
When n—~, this expected value approaches 


> ym. 


j*% 
Writing 
/ 
Dvi=Y. 
t=] 


we note that the expected value of x.; approaches (y.—y;)m,, and the expected 
value of z.. approaches 


I 
Px } Yim; 


i=1 


when n— ~. The statistic mj converges in probability to (1 —z,)m;/(1— }oz,m,), 
when z;=y;/y. is a fixed constant and y—*, n>. Thus, mf is not a con- 
sistent estimator of m,; (except in the special case when z;=1/J for i=1, 2, 

-, I). This fact has led us to reexamine the estimation methods used by 
BKM. 








846 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


If it is assumed, as was done by BKM, that the (z.;—2;.)/z.. are negligible, 
then the system of equations (3.1) is approximately 


I 
> m; = 1, 


jai (3.2) 
[Am; + m(—A) + 2z.:]/z.. = 0, 


fori=1, 2,3, ---, JZ. 1f the additional assumption is made that the m; are small 
(i=1, 2,---, I) so that the mi are also negligible, then (3.2) is approximately 


I 


> m; = 1, 


j=l (3.3) 
m(—A) +2, = 0, 


for i=1, 2, - ++, J, which leads to the approximate solution m* =2,;/z., given 
by BKM. (The latter assumption is not explicitly mentioned, however, by 
BKM.) If any of the m? terms are not negligible, then this approximation can 
be quite inaccurate as we shall see below. 

To illustrate the effect of neglecting the m? terms, we shall compare the solu- 
tion of (3.3) (i.e., the m;*) with the solution of (3.2), using the data presented 
by BKM. The estimates obtained from (3.2) are given in Table | herein along 
with the estimates, mj, obtained by BKM for females in the 40-44 age classifi- 
cation. 

The difference between the estimates of my in Table 1 is .640-.412 or 22.8 
per cent, which is a relative difference of .228/.640 or 35 per cent. The relative 
difference between the estimates of mx is (.003-.006)/.003 or 100 per cent. 
Hence, m; is a first approximation which should be replaced by the solution 
of the system of equations (3.2) if the (7,,—2;.)/z., are negligible; if they are not 
negligible, then m; should be replaced by (3.1). The accuracy of the approxima- 
tions used to estimate M* can affect the subsequent estimates of S and M pre- 
sented by BKM, as we shall see in the following sections. 


1. ESTIMATES OF THE FRACTION OF MOVERS WHO ARE 
ECTED (IN THE LONG RUN) TO BE IN EACH OF THE 
LEVEN CODE CATEGORIES, FOR FEMALES IN THE 


TABLE 
EXP 
E 


40-44 AGE CLASSIFICATION 








Code Category Solution of (3.2) 


A 001 001 
B .005 .009 
C .074 122 
D .018 .032 
E .027 047 
F 012 .022 
G 187 211 
015 .026 
.067 .112 
.003 .006 
.640 .412 











THE MOVER-STAYER MODEL 847 


We now present some practical techniques for finding a solution of the system 
of equations (3.1) or (3.2). For the sake of simplicity, we first consider (3.2). 
From (3.2) we see that 


m; (1 — m,) = 2.;/X, (3.4) 


so that the m; are monotonically decreasing functions of \ (when all m;<}). 
Thus, >>; m; is also a monotonically decreasing function of \, which simplifies 
the problem of determining so that >>; m;=1. The following considerations 
will lead to an iterative scheme for determining \: From (3.2) we have that 


af d~<m;- > m:| = )iz.. (3.5) 


Hence 
A= r../(1 _ Emi). 


Ao B.. 


Thus, we see that 


Also, since m,(1—m,) $3, we see from (3.4) that 


\ > 4 maximum 7 ;. 


From (3.7) and (3.8) we have 


\ > min [x_., 4 max z,;]. (3.9) 
7 


Let 4, denote the minimum possible value of \ determined from (3.9). If the 
estimates #;, which are obtained by using }, in (3.4) are such that - m,>1 
(when all m; are taken $4), then the m; are too large, and ) om m? is also too 
large. Hence, the estimate of \ obtained by taking },.=2,,/(1— >, m:) will be 
too large. Using the value of i:, new estimates of m; can be obtained, which are 
too small. With these new estimates of m,;, a new estimate of \ can be obtained 
from the relation (3.6). Continuing this iterative procedure will lead to the solu- 
tion of (3.2). 

It may happen that the estimates #,, which are obtained using \,, are such 
that ds m;<1. In this case, a solution of (3.2) does not exist with all m;<}. 
The case where one of the m; is greater than 3 is more difficult to treat. Graphi- 
cal methods will sometimes be helpful. These can be based upon the fact that 
from (3.2) we have that 


log m,(1 — m,) = log x; — log d; (3.10) 


by plotting log x.; (for i=1, 2, - - - , I) using the ordinate scale of semi-logarith- 
mic paper, and by drawing the graph of log m(1—m) using a second sheet of 
such paper, the values of m; that satisfy these equations can be determined, for 
each value of the additive constant—log X., by inspection of the graphs. 

The techniques described above for solving (3.2) can be modified in a straight- 








848 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


forward manner in order to solve (3.1). The equation (3.4) should be replaced 
by 


m,(1 — m,) = [x +m(x;. — z..), X, (3.11) 


from which it follows, as before, that the m; are monotonically decreasing func- 
tions of \ (when all m;<}4), and that >>, m; is also a monotonically decreasing 
function of \, thus simplifying the problem of determining \ so that >>; m;=1. 
The equation (3.10) should be replaced by 


log m(1 — m,) = log [z., + mi(a;. — x.,)] — log X, (3.12) 


which permits the use of graphical methods to determine, for each value of the 
additive constant—log \, the values of m; that satisfy these equations. 

The estimators of M* presented in this section (i.e., the solution of (3.1), or 
(3.2) when appropriate) were based upon data obtained in the initial quarter 
and in the (n+1)th quarter (i.e., the x; and 2,;.), and it was assumed that n was 
sufficiently large so that M” approximates M~*. A different method of estima- 
tion, which does not require this assumption, will be presented in Section 4.1.3. 


3.2 The Estimation of S 


The estimator of s, (¢=1, 2, - - - , J) presented by BKM is obtained by first 
noting that 


n) 


Dir = 8; + 6 ae s,m, ; (3.13) 


where p{” is the expected proportion of workers in the ith code category in the 
initial quarter who were also in that category in the (n+1)th quarter. Equa- 
tion (3.13) can be rewritten as 


(n) 


8; = (pax — ms )/(— my). (3.14) 


Let /? denote the observed proportion of workers in the ith code category in 
the initial quarter who are also in that category in the (n+1)th quarter. Con- 
sidering the parameters S and M as fixed constants, £{" will be a consistent 
estimator of p{” when the number, w;, of workers in the ith code category in 
the initial quarter is large. We therefore can use ~% to estimate p% in (3.14). 


(The estimator of p”? used by BKM is 


Vv Vv 

(n) (n) 

Pui = wi; (t) > w(2), 
t=1 


t=1 


where w{?(t) is the number of workers in the ith code category in the ¢th 
quarter who were also in that category in the (n+¢)th quarter, and w,(t) is the 
number of workers in the ith code category in the tth quarter (¢=1, 2, ---, V). 
For V = 1, A? = 6%. The conditions that are necessary in order to justify the use 
of J (when V >1) as an estimator of p® will be discussed in Section 4.2. For 
simplicity, we take V = 1 in the present section.) As n approaches infinity, (3.14) 
becomes 





THE MOVER-STAYER MODEL 849 
& = oan m;)/(1 — mi), (3.15) 
where 


(a) P (n) : (n) 
Pi =limp; and m;= limm,; . 


n+ 2 n-+ @ 


Thus, when n is sufficiently large, we obtain the following estimator of s;: 


8; = (pir. — m,)/(1 — m4). (3.16) 


When the m,; are known a priori, then (3.16) can be used directly to estimate s;. 
When the m; are unknown (as is usually the case), then BKM suggest that the 
m, be replaced in (3.16) by the mj; in the preceding section we recommended 
different estimators for the m,. For example, the estimate of sy given by BKM 
(for females in the 40-44 age classification) was (.557 —.412)/(1—.412) =.246 
[3, p. 116]; using the modification suggested herein, the estimate of sy should 
be zero, since (a) the BKM estimate of my (i.e., .412) should be replaced by 
.640 (see Table 1 herein), (b) (.557—.640)/(1—.640) is negative, and (c) we 
know that 0<sy <1. (If more accuracy is desired, (3.1) should be used rather 
than (3.2) or (3.3) to estimate the m, in (3.16).) Since the code category U was 
reserved for workers in the study who were “not in covered employment” [3, 
p. 21], and since all workers in the study were in covered employment at some 
time during the period of investigation [3, p. 6], there were in fact no workers 
studied who remained in category U for the entire period of investigation; thus, 
sy was actually zero. 

We shall now present an upper bound for the s; (¢=1, 2, - - - , J) based upon 
data given by BKM but not used there for this purpose; viz., the observed 
fraction, f”, of workers in the ith code category in the initial quarter who re- 
mained continuously employed in that category for the next n quarters. 

The data in Table 2 indicate that the estimates of s; presented by BKM (for 


TABLE 2. BOUNDS AND ESTIMATES FOR THE PROPORTION, s8;, OF 
WORKERS IN THE ith CODE CATEGORY WHO ARE STAYERS 


(Females, 40-44) 











| Fraction of Workers Continuously Employed . 
Scena BKM Estimate 


of 3; 





Code Category 





B — a3 .473 
C | .562 .494 .679 
D .624 591 .691 
E .507 .408 581 
F .738 791 836 
G .490 .424 .519 
H .677 | .615 .738 
J .431 .372 506 
K — | 0 0 

U = — . 246 


| 

| 

| 

For 8 Quarters For 11 Quarters | 
} | 








Source: [3], pp. 97, 101, 116. 








850 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


females, 40-44) were usually larger than the corresponding f.” for n=8 and 
n=11. If information concerning f.” is available for purposes of estimating un- 
known parameters, then f,” should replace whatever estimate of s; is available 
whenever that estimate is larger than the corresponding f,”; this modification 
will reduce the mean squared error of the estimator. Thus, we obtain the esti- 
mator 

8: = min ag si], (3.17) 


where s;* is computed from (3.16) with m, estimated by the solution of (3.1), or 
(3.2) when appropriate. (For values of 7 where f” is unknown, we take sf = sj.) 

Tables 3 and 7 in Section 5 compare the estimates of s; given by BKM with 
other estimates that use information concerning f.”. We find that the BKM 
estimates are usually too large. 

The estimators of S presented in the present section were functions of the 
estimator of M@, and it was necessary to assume here, as in the preceding sec- 
tion, that n was sufficiently large so that M” approximates M*. The estimator 
of S presented in Section 4.1.2 does not require this. 

3.3 The Estimation of M 
The estimator of M presented by BKM is obtained by first noting that 
8 + (1 — simi; for 7 = 1, 
Pi = ; thd all (3.18) 
(1 — s;)mi; for 7 #12, 
where p,; is the expected proportion of workers in the 7th code category in the 
initial quarter who were in the jth category in the following quarter (7, 7=1, 2, 
-++, J). Thus 


Pii — $i) (I ae 8;) for J = t, 


(3.19) 


( 
i 


1i;/(1 — 83) for 7 ¥ 1. 


Let £;; denote the observed proportion of workers in the ith code category in 
the initial quarter who were in the jth category in the following quarter. Since 
b,; will be a consistent estimator of p;; when w,; is large, we therefore can use 
p;; to estimate p,; in (3.19). Thus, we obtain the following estimator of m,;: 


—- { (pis — 8:)/(1 — 8) for j = 1, 


Mi; ae ; : (3.20) 
\pi/( — 8) for j ¥ i. 


When the m; are known, then (3.16) can be used directly to estimate s; in 
(3.20). When the m; are unknown, then (3.16) can be used after the m, in (3.16) 
have been replaced by the estimators recommended in Section 3.1. If informa- 
tion concerning the f;” is available, then (3.17) can be used to modify the esti- 
mate of s; used in (3.20). Since the estimates of s; recommended here differ 
from those recommended by BKM, the corresponding estimates of the m,; will 
also differ. The magnitude of these differences will be discussed in Section 5 
where comparisons are made between the various methods of estimation pre- 
sented in this section and those in Section 4. 





THE MOVER-STAYER MODEL ; 


4. MORE DIRECT METHODS OF ESTIMATION 


4.1 Estimation Methods that Use Information Concerning the f\” 


In Section 3.2 an upper bound for s; was obtained using data concerning the 
fraction, f.”, of workers in the ith code category (i=1, 2, - - - , J) in the initial 
quarter who remained continuously employed in that code category for the next 
n quarters. In the present section, we shall use these data to derive more direct 
methods of estimation. We shall obtain consistent estimators of 1, S, and M=@ 
(when the w; are large). These estimators will be consistent regardless of 
whether n is small or large (as long as n>1). This can not be said for the esti- 
mators discussed in Section 3; justification for the estimators presented there 
depended upon the fact that n was sufficiently so that M* approximated M~”. 

4.1.1 The estimation of M. For simplicity we denote f” by f;. When w; 
approaches infinity, then f; is a consistent estimator of s;+(1—s,)mj,, and 1—f; 
is a consistent estimator of (1—s,)(1—m§). Since #.,,; is a consistent estimator 
of pi; given by (3.18), we see that (f::—f;)/(1—f,) =hi is a consistent estimator 
of (mi —mh)/(1—mj), and that £:;/(1—fi) =hi; is a consistent estimator of 
m;;/(1—mj) for 742. We note that h;; is the proportion of workers in the ith 
code category in the initial quarter who were in the jth category in the follow- 
ing quarter (7, 7=1, 2, - - - , J) considering only workers in the ith category in 
the initial quarter who were not continuously employed in that category for the 
next n quarters. When n>1, the maximum likelihood estimates of the mi; 
based upon the conditional distribution of the hj; (given the f;) can be obtained 
by first solving for m,; the equation 


hi = (my — mii /(1 — mi) (4.1) 


fori=1,2,---+,J, and then estimating m,; by 


mi; = hil — mi) =hi1—m)/A—-hisd, 7 i, (4.2) 


where #1;; is the solution of (4.1). Equation (4.1) can be rewritten as 


Mi=hitAd-h Mii, (4.3) 


which has a unique solution in the interval 0<m,;<1 when 0<h;;<1—1/n. 
The solution, m;;, is such that 

mis — his = (1 — hisdmis > 0. (4.4) 

When h,;;>1—1/n, take m,;;=1. Equation (4.3) can be solved by graphical 

methods—by plotting m;;—h;; and (1—h,,;)m}—or by successive approxima- 

tions. When w; is large (i.e., wi), then m,; is a consistent estimator of m,;. 
When x is large, a simple approximation for #,; is 

mi; = hi; Gj= 1, 2, he ‘ab (4.5) 

4.1.2 The estimation of S. It is now possible to estimate S using the estimator 


of M suggested in the preceding section rather than the estimator of M@ as 
described in Section 3.2. From (3.18) we have that 








AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


(pis = mii)/(1 — mi) = Si, (4.6) 
fori=1, 2,--.-, J. Using the estimator of m,,; presented in the preceding sec- 
tion, we obtain the following estimator of s;: 


§: = (pic — mMi)/(C1 — mi) for mi; <1. (4.7) 


Using (4.3), it is easy to see that (4.7) can be rewritten as 


&=(fi- m.i)/( ~ ts). (4.8) 


(If m:;=1, take $;=f;.) We have that §;<f;. When w; is large, then §; is a con- 
sistent estimator of s;. When n is large, a simple approximation for §; is 


3 = fi (4.9) 


4.1.3 The estimation of M*. The estimators of M®@ presented in Section 3.1 
were then used to estimate M. It is now possible to use the estimator of M pre- 
sented in Section 4.1.1 to estimate M~. 

Let m=(m, me, - - - , mr) denote the J probabilities forming the matrix M™~. 
Thus 


From the theory of Markov chains (see, e.g., [6]), we know that mM =m. The 
estimator, M, of M suggested in Section 4.1.1 can be used to form the system 
of equations 


mM =m, > m; = 1. (4.10) 


This system of equations can be used to determine m= (m, mz, - - - , mr). Thus, 
M* can be estimated directly from M by solving (4.10). When w;, is large 
(¢t=1,2,---, J), the estimator of M@ obtained in this way is consistent. 


4.2 Estimation Methods that Use Additional Information 


The estimators m,;; and m,; (in Section 4.1.1) and mj (in Section 3.3) are all 
based, in part, upon the #,;; computed from data obtained in the initial and the 
following (i.e., the second) quarter. When data are available for the initial, the 
second, third, - - - , (7+1)th quarter, the question then arises as to whether 
bi; should be replaced, in the various estimation methods, by 


- ? 
pis = > wi;(t) } ee w(t), 
t=1 tl 


where w,(t) is the number of workers in the ith code category in the tth quarter, 
and w,;(t) is the number of workers in the ith code category in the fth quarter 
who were in the jth category in the (¢+1)th quarter. (For T=1, $i;= ij.) 





THE MOVER-STAYER MODEL 853 


Since justification for the use of p,; in the various estimation metheds depended 
upon the fact that #;; converged in probability to p,; given by (3.18), it will be 
necessary to examine whether §,; also converges in probability to pi;. 

Let v,(t) denote the expected number of movers in the ith ecde category in 
the tth quarter. Let 


T 
6,(t) = v,(t)/wi, 6; = oe 6,(t)/T, and p; = w,/wv, 


tel 
where 
I 
w= Dw, 
i=1 


is the total number of workers studied. We see that 


T 
B| > wd) | /w = Tpi[s; + I, 
t=1 
z. T iL Si - 6; ii f ) = 5, 
e[ Seu] /w= s els mii| orj=% 
t=1 


\ Tp0mj; for 7 #1. 


(4.11) 


When w approaches infinity (with @;, s;, and p; fixed), then ),; converges in 
probability to 
flss + Bm: | ‘[s; + 8; | for j = 1, 


qi = 4- F . (4.12) 
: imi; ‘isi + 8;] for j # 2. 


Thus §,; will converge in probability to p,; if and only if 
6; = 1— 8. 


From (4.11) we see that (4.13) is equivalent to 


a 
w;, = | Swio| /7. (4.14) 
t=1 


Condition (4.14) states that the number of workers in the ith code category in 
the initial quarter must be equal to the average of the expected number of 
workers in the ith code category in quarters 1, 2,---, 7. If and only if this 
condition is met, will the £;; converge in probability to p,;. Thus, if this condi- 
tion is met, then we can justify replacing /;; by J; in the various estimation 
methods discussed earlier herein; for example, the estimator m,, can be replaced 
by 


mis = 


(4.15) 


. fe —f)/A-f)  forj =7, 


Bis/(1 — fi) for j ¥ 1, 
and the estimator m,; can be replaced by m,; obtained by solving the equations 
iy =Mitl—m)m:  forj =i, (4.16) 
and by 





AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 
A ~ A” = i a oe = . . 
mis = Mil — mya) = M1 — my)/(1 — mM) for 7 ¥ 1, (4.17) 


where n is, as in (4.1) and (4.2), the number of quarters observed (after the 
initial quarter) in order to compute the statistic f;=f,”. 

We noted in Section 4.1 that m,; and §; were consistent estimators of m,; 
and s,, respectively, when w;— © and when the length of time, n, was fixed. If 
condition (4.14) is met, then m,; is also a consistent estimator of m,;, and 


$5 = (Bis — Mis) /(1 — iss) = (fi — M)/( — mid (4.18) 


is a consistent estimator of s;. The only estimators presented herein that are 
consistent estimators of m;; and s; when the lengths of time n and T are fixed 
(w,>) are #,; and $;, and (when (4.14) holds true) m,; and §;, respectively. 

We shall now present a different method of estimation, which does not re- 
quire that (4.14) hold true, but which nevertheless is based, in part, upon the 
bi;. Let c; denote the number of workers in the ith code category in the initial 
quarter who were continuously employed in that category for the next n 
quarters. Thus the number of workers who remained in a single category for the 
first (n+1) quarters is 


I 
oe 
i=l 


If the data for these ¢ workers is separated from the data for the remaining 
w-c workers, a recomputation of $,; based upon the data for the w-c workers 
will give us the following (when n= T): 


T T 
| | » wilt) i re.| /| ys w(t) = re.| for 7 _ 1, 
| t=1 ran 


it~ (4.19) 


T T 
Dd w(t) /| > wilt) — re.| for j ¥ i, 
- t=) 


t=1 
which can be rewritten as 


fldis — fgil/[1 — figs) for j = #, 
\Bu/[1 — figi] for j ¥ 1%, 


ni; = 


(4.20) 


where f;=c,/w;, gi =w;/W;, and 


x 
®; = Sw(d)/T. 


i=1 


The statistic 7,;; is the ratio of the total number of workers in the ith code 
category in the ¢th quarter who were in the jth category in the (¢+1)th quarter 
(t=1, 2,---, 7) and the total number of workers in the 7th category in the 
tth quarter (¢=1, 2, - - - , 7’), excluding all workers who remained continuously 
employed in a single category throughout the (7'+1) quarters. It can be seen 
that #,;; is the maximum likelihood estimator of m,; based upon all the data 
available for the first (7+1) quarters. We also find that f;=$, is the maximum 





THE MOVER-STAYER MODEL 855 


likelihood estimator of s; based upon these data. The estimators ;; and §, are 
consistent estimators of m,; and s;, respectively, when T—. 

In closing this section, we indicate how data available for the initial, the 
second, ---, (7’+1)th quarter were actually used by BKM [3]. Firstly, in 
estimating m;; (see (3.20) herein), BKM replaced /,; by J,;. As noted earlier in 
this section, justification for this replacement required that condition (4.14) 
hold true. (This condition was not explicitly mentioned, however, by BKM.) 
Secondly, in estimating s, (see (3.16)), BKM replaced #/? by 


(n) 


as defined in Section 3.2 herein. By an argument similar to that presented 
above, it is possible to prove that justification for this replacement requires 
that the following condition hold true: 


w; = E| > wo | / V. (4.21) 


Thirdly, in estimating m;, the x,;;(¢#j) were replaced by £;; (defined in Section 
3.1) in the cross classification table used to calculate the statistics z2;. and x; 
appearing in (3.1). When the 2,; are replaced by #,;, then estimates of m; can 
be obtained from (3.2) if 


v I 
m; = >, v,(t)/V > v(t). (4.22) 
t=1 i=1 


Let m,; denote the estimator of m; obtained from (3.1) (or (3.2) when appro- 
priate) using #,; rather than x;;. When (4.21) holds true, we can replace (3.16) 
by 


+ ane es = 
5; = (pi — m,)/(1 — M,), (4.23) 
and we can replace (3.17) by 
* i (n) + , 
3; = min[f; , 5]. (4.24) 


When (4.14) and (4.21) hold true, the estimator mj given by (3.20) can be 
replaced by 
for j = 1, 


for 7 #2, 


(4.26) 


Mi; 


_* ay — 35;)/(1 — 51) for j = i, 


Bis/( — 5:) for j Xi. 


The estimator of m,; actually used by BKM was m5 where, however, in com- 
puting 5/, the value of #; was obtained from (3.3) rather than from (3.1) or 
(3.2). 








AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


5. COMPARISONS OF THE INDIRECT AND THE MORE DIRECT METHODS 
OF ESTIMATION 


In order to use the procedures presented in Section 3 herein or those pre- 
sented by BKM [3], data are required concerning /;;, £ (i=1, 2,---, D) 
and x,; (ij) (or p,;, py and #,;) for n sufficiently large so that M* approxi- 
mates \/*. The procedures in Section 4.1 required data concerning the ,;, 
and the f; (¢=1, 2, ---, I) for somen>1, and the procedures recommended in 
Section 4.2 required data concerning #;;, f; and g; ({=1, 2,---, J). For the 
sake of simplicity, we shall assume when comparing the estimation methods in 
Section 3 with those in Section 4 that n is large, although the methods in Sec- 
tion 4.1 do not require this. (In Section 4.1, consistent estimators of M and S 
were obtained when n was a fixed constant and w;—~.) One possible advantage 
of the methods of Section 3 is that in some cases it may cost less to obtain the 
data required (i.e., the p,;, A”, x:;) for these methods than to obtain the data 
required for the other methods (i.e., the A,;, f; or the J;;, fi, g:). On the other 
hand, the computations described in Sections 4.1 and 4.2 are somewhat simpler 
than those in Section 3. 

When n is large, the estimator of s; suggested in Sections 4.1 and 4.2 is §,. 
The variance of §; is 


2 n n a 
a3, = mi(1 — mii) (1 — 8;)/wi. (5.1) 


When n—, this variance approaches zero, and §; converges in probability to 
s;. Thus, §; is a consistent estimator of s; when n is large even if w; is not neces- 
sarily large. 

The estimator, s;°, of s; given in Section 3.2, when m; is assumed known, will 
have a variance equal to 


2 (n) 


2 (n) 2 
Oo4 = mi (1 —m,; )(1 — 8;)/w(1 —m,) , 


which converges when n—< to 


lim Oss = m,(1 — s,;)/w,(1 — m,). (5.3) 


7 


The ratio, Z, of the variance (5.2) compared with (5.1) is 


E = mi.’ (1— mj: )/m(1 — mi) (1 — m)’, (5.4) 


which approaches infinity when n—«. Thus s; has a larger asymptotic vari- 
ance, when m, is assumed known, than does §;. When m; is estimated by 
(3.1), the s* will not converge in probability to s; as n—« when w; is fixed. (If 
w, is fixed, the variances of {7 and of the estimator of m; will not approach 
zero as n—« and their correlation will not approach one.) Hence, in this case 
too the asymptotic variance of s; is larger than that of §;(n—«). When the 
estimator s; is modified by using information concerning upper bounds for the 
8;, as in (3.17), the modified estimator, sf, will have a smaller mean squared 
error than s;°, but sf will nevertheless not converge in probability to s, as 
n— (w; fixed); thus the asymptotic variance of sf is also larger than that of 
§,(n—@ ). 








THE MOVER-STAYER MODEL 857 


The estimator §; of s; given in Section 4.1 differs from §; by the following 
amount: 


jO if mii = i. 


55 = Se = n n _ 
\-(. —fimii/( — mi) if Mag <1. 


When w; is fixed, we have that 
| §&-3s| sa- siz.) (1 - zi) for n > w,(1 — s,), 
where z;<1 is the solution of 


Za4=nte+1—- rei 


with 1—r;=1/(n;—1) and n;=w,(1—s,;)+1. Since the right hand side of the 
inéquality (5.6) converges to zero when n—~, it follows that | $;—-5; con- 
verges in probability to zero. Thus, §; is also a consistent estimator of s; when 
n is large even if w; is not necessarily large. 

When n is large, the estimator of m,; suggested in Section 4.1 is m,;. The 
asymptotic variance of m;;, when w;—>, is of the following form: 


9 


- , / n oo 
Om,; = my(1 — my) /wl — mu)(1 — 8), (5.8) 


where mj,=m,;/(1—mi,) for 74%, my=(mi—my)/(1—mi,). This asymptotic 
variance converges when n—= to 


lim o%,, = m,,(1 — m,,;)/wi(1 — s,). (5.9) 


The estimator m,;; converges in probability to m,; when w,-*, n>, 
The estimator mj; of m,; given in Section 3.3 will have an asymptotic vari- 
ance of the following form: 


((1 — m,,) — (1 — m,;,;)m,; yh 
7 + — for 7 = 1, 
w(l—s)L (1 — m,;) 


(5.10) 


mm; f ; 
— for) ¥ i, 
(1 — m,) 


when m, is assumed known, and when w,>* and n—«. The asymptotic vari- 
ance (5.10) is larger than (5.9), and the relative increase, F?, in this variance 
(when (5.10) is compared with (5.9)) is 


( mM; 1 — mi ‘ 
| | - | for 7 = 1, 
| LI — m; 


R = { (5.11) 
| Mion. ‘ 
{Fema 


Thus, the asym varianceptotic of mj, when m; is assumed known, is larger 
than that of m,;. When m; is estimated by (3.1), the asymptotic variance 
of mj is also larger than that of m,;; we give here the following lower bound 
for the asymptotic variance of mj: 








858 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 
2 / 
Om, j+ > Mi(1 — mi;)/wi(1 — 8;). (5.12) 


The estimators m,; and mj are not consistent estimators of m;; when n= 
and w;, is fixed; the estimator #,; given in Section 4.2 is. The asymptotic vari. 
ance of #,; is of the form 

om,; = mal — m;;)/n| E(w.) — wis;] (5.13) 
when n— . Thus i;; is preferable (when n—« ) to m,; and mj as an estimator of 
m,;. If condition (4.14) holds true, then the estimators m;; and mi; are also 
consistent estimators of m,;; when n= T'-—«. (Actually (4.14) can be replaced 
when 7’ by a weaker condition, viz., 


7 
w; = lim al bh wd | / 7 (5.14) 
T+ t=1 


but we shall not go into these details here.) 

We now present numerical comparisons of some of the estimates discussed 
herein. These estimates were computed from the data given by BKM [3] for 
females in the 40-44 age classification. Numerical values of §;;, 5,7, Zi;, fi, and 
g; can be computed from the BKM data; the estimates presented here will be 
based upon these statistics. (The /,;, pi’, 24; were not given by BKM, and 
the numerical values of estimates based upon these statistics will not be given 
here.) Tables 3, 4, 5, 6 compare the various estimates of s;, m,;, mi, and py, 
respectively. For s; (Table 3), the BKM estimates were obtained from (4.23) 
with n=8, V =4, and m; computed from (3.3) rather than (3.1), the sf were 
obtained from (4.24) with n=8, V=4 and m; computed from (3.2), the §; were 
obtained from (4.9) with n= 11, the §, were obtained from (4.18) withn=T7'=i1. 
For m,; (Table 4), the BKM estimates were obtained from (4.25) with 7 =11, 
n=8, V=4 and m,; computed from (3.3), the mi were obtained from (4.26) 
with 7=11, n=8, V=4 and m,; computed from (3.2), the Mii, Mix, and Mii 
were obtained from (4.15), (4.20), and (4.16), respectively, with 7=n=11. 
For my (Table 5) the estimates were obtained in a similar fashion. For Psy 
TABLE 3. ESTIMATES OF THE PROPORTIONS, s;,, OF WORKERS IN THE ith 

CODE CATEGORY IN THE INITIAL QUARTER WHO ARE STAYERS 


(Females, 40-44) 


Code Category | BKM Estimate i | 8; §; 
B 47 47 18 | 16 
Cc 68 56 .49 46 
D 69 62 59 56 
E .58 51 Al 35 
r 84 74 .72 .58 
G .52 .49 .42 .39 
H 74 68 | 62 58 
J | 51 43 37 35 
K | .00 00 .00 .00 
U .25 .00 .00 .00 





TABLE 4. ESTIMATES OF THE PROBABILITY, mi, THAT A MOVER 
IN THE ith CODE CATEGORY IN THE INITIAL QUARTER WILL 
BE IN THAT CATEGORY IN THE FOLLOWING QUARTER 
(Females, 40-44) 


Code Category | BKM Estimate 


| 





~ 
S 


.80 
.78 
79 
81 
91 
.78 
.79 
-73 
.58 
85 


.67 
63 


B | 
D | 70 


Nu J 
~Is1¢ 


a a 


K 70 
F .76 
G 72 
H | 67 
J 64 
K | 58 
U .76 


2 
“Ni 


aa 


| oo on 
NS 00 Ww 00 


TABLE 5. ESTIMATES OF THE PROBABILITY, miv, THAT A MOVER IN THE 
ith CODE CATEGORY IN THE INITIAL QUARTER WILL BE IN 
CATEGORY U IN THE FOLLOWING QUARTER 
(Females, 40-44) 





Code Category BKM Estimate ry | 





B .26 26 

Cc .30 22 19 
D 2] 17 16 
E 23 .20 16 
F .20 eS 14 
G 23 2 19 
H 25 .20 37 
J .28 .24 .22 
K 24 24 24 
U 76 82 83 








TABLE 6. COMPARISON OF THE ESTIMATED AND OBSERVED PROPOR- 
TION OF WORKERS IN THE ith CODE CATEGORY IN THE 
INITIAL QUARTER WHO WERE IN CATEGORY 
U IN THE ELEVENTH QUARTER 


(Females, 40—44) 
Code | BKM | 


Category| Estimate | 


ay | gn | Observed Number of 
Biu | iv | Proportion Observations 





*(11) | <1) 
Biy Pi 


| 








29 *.28 * 40 .40 46 ‘ (11) 
16 .23 .25 .25 .28 : (267) 
D 16 .20 -20 .20 .22 . (93) 
E 21 .26 .29 .29 33 ‘ (142) 


ee om, 
* 
* 
oa 
F *.08 12 12 12 .16 , (61) 
* 
* 
* 


Cc 


G .24 27 .29 .29 .32 : (380) 
H 13 17 .19 19 .22 17 (65) 
J -25 31 .32 .32 35 43 (153) 
K 51 54 .50 .50 .53 .25 (4) 
U .63 -54 51 .50 .53 -53 (663) 























* An asterisk indicates that the estimated proportion is less then the observed. 








860 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


m > . y * . . ( 
(Table 6), when i#U the BKM estimates were obtained from pw 
=(1—s,)m\P where s; and my were replaced by the corresponding BKM 
estimates, 

*(11) * (11) = (11) am (11) 

iU : (1 a 3; )itiev ’ Pu = (1 I §) ma ’ 

(11) (11) 4 (11) & « i Cee 

vw =(U1—5)mn , pu = (1 — $i) mu 


For p\t’(¢=U), the estimates presented in Table 6 were obtained in a some- 
what similar fachion based, however, upon the fact that Put "padi a — sy) mpi. 
For purposes of comparison, the observed proportions, A‘, are also included 
in this table. 

These tables indicate that. the BKM estimates of s; (for females, 40-44) were 
usually too large, the BKM estimates of m;; were usually too small, the BKM 
estimates of mj (i#U) were usually too large, and the BKM estimates of 
py (i#U) were usually too small. The following table indicates that the 
BKM estimates of s; (for males and females in the 20-24 and 40—44 age classi- 
fications) were usually too large. 


TABLE 7. ESTIMATES OF THE PROPORTIONS, s;, OF WORKERS IN THE ith 
CODE CATEGORY IN THE INITIAL QUARTER WHO ARE STAYERS. 
ESTIMATES FOR DIFFERENT AGE AND SEX CLASSIFICATIONS 


Males, 20-24 | Males, 40-44 a shaman 20-24 Females, 40-44 
Code ne ee 7 -——|—— . = —_—— eons - 


Category| BKM | . BKM if F BKM | BKM 
| Estimate ? Estimate | - | Estimate | $s ea 


| Estimate | 
| 


47 ;} .18 
.68 .49 


B | .39 i 61 | a | 49 | 
| “69 59 
| 
| 
| 


C | 4] mi me 6S .38 
D -_ i 2 7 47 | .39 
E 43 | .2 73 | .60 | .42 
F di | 24 | .7%: ‘a |. .B4 .84 
G | 37 | 2] .@ 48 | .28 52 
H 42 a 7! ‘a .49 ef .74 
J .29 16 BE 4 .33 
I 04 : ; Lug | .00 


U 20 j .26 ‘ .26 


tN tw bt bo 


.58 


Po Nw Nw OS 


w 
~ 
So 


Source: [3], pp. 95-97, 116. 


While we have noted certain biases in the BKM estimators, it is important 
to point out that the method of estimation used by BKM can sometimes lead 
to quite accurate predictions, which will at times actually be more accurate 
than those based upon the estimators suggested herein. This can happen, for 
example, if the mover-stayer model is not a true description of the process 
generating the observed data and if the biases in the BKM estimation methods 
are in the same directions and of the same magnitudes as the differences in 
predictions that would arise if these predictions were based upon a model that 
desscribed more accurately the process generating the observed data rather 
than upon the mover-stayer model. It is, therefore, important to be able to 
test whether the mover-stayer model is actually a satisfactory description of 





THE MOVER-STAYER MODEL 861 


the observed process. The following section presents methods for testing various 
hypotheses concerning the observed process and the adequacy of mover- 
stayer model as a description of it. 

6. THE TESTING OF HYPOTHESES 

For the sake of simplicity, we first consider here the special case where s;=0 
fori=1,2,---+, J. In this case, the mover-stayer model is simply the Markov 
chain model. For tests of hypotheses concerning Markov chains, the reader is 
referred to [1], [2], [9]. In the present section, we shall show how results in 
[2] and [9] can be extended to obtain tests of hypotheses concerning the 
mover-stayer model. 

Assuming that there are no stayers, let p,;,,>0 be the probability that a 
worker who is in the ith code category in the initial quarter (i.e., quarter 1) 
will be in the jth category in the following quarter (7, j7=1, 2,---, J). The 
pij;1 can be estimated from the data in the following cross classification table: 


|| war(1) = wye(1) + + + wir(1) || 
|| Wer(1) wWee(1) + + + wer(1) 


wri(1) wyo(1) esis wrr(1) || 


where w;,;(t) is as in Section 4.2. (We first consider the case where t=1.) 
When the row totals 


I 
w (1) = : * w,j(1) 


are given, the entries (wi(1), wy(1), wis(1), - - +, wir(1)) in the 7th row will 
have a multinomial distribution with probabilities (pi:1, Dis;1, Pis;1, * * * ,Par:1) 
and sample size w,(1), assuming that the probabilities operate independently 
for each of the workers in a given code category. Hence, as an estimator of 
Piz. We take f,;.1=w,;(1)/w,(1). Since the distribution of the entries in the 
ith row is statistically independent of the distribution in the kth row (¢#k), 
the data in the ith row can be considered as an independent sample of size 
w,(1) from a multinomial population. The large sample joint distribution of 
Vwill) (fi-1—piy.1) is normal and is simply what would be obtained for es- 
timates of multinomial probabilities p;;,1 (j=1, 2, 3, - - - , 7) from independent 
samples (i=1, 2,3, - - - , Z) (see [1], [2], [9], [14]). 

Let us now consider the null hypothesis that the probability that a worker 
will be in the jth category in the second quarter does not depend on his cate- 
gory in quarter 1; i.e., pij,1= pej.1= + + - =prj1. This hypothesis states that the 
probabilities associated with each row in the above I XI cross-classification 
table are the same. To test this hypothesis the usual large sample test of homo- 
geneity for independent samples from multinomial populations can be applied 
(see [5, p. 445]). This test is the same as the standard x? test of independence 
in a cross-classification table. The x? statistic used for testing this hypothesis 
will have (J —1)(I7—1) degrees of freedom. 

The previous remarks dealt only with quarter 1 and the following quarter. 





i2 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


To study data from quarter 2 and the following quarter, a cross-classification 
table similar to the one given above can be obtained where the w,;(1) are 
replaced by the w,;(2). Also, the pj;,1 are replaced by p,;,2, the probability that 
a worker, who is in the ith category in quarter 2, will be in the jth category in 
the following quarter. When the row totals 


I 
w(2) = >> w;,(2) 
j=l 

are given, similar remarks can be made about the distribution of w,;(2), 
Dij.2= Wi;(2)/w(2), and Vw;(2) (pij;2—piz-2). It can also be shown that the 
large sample joint distribution of /w;,(2) (pij;2—pij.2) is independent of the 
distribution of the corresponding statistics obtained for quarter 1. Hence, 
when the sample is large, the distribution of these statistics obtained for 
quarter 2 is simply what would be obtained for estimates of multinomial prob- 
abilities p,;.2(7=1, 2,3, - - - , J) from independent samples (¢=1, 2,3, ---, J), 
and these data are also independent of the corresponding data for quarter 1. 

Let us now consider the null hypothesis that the observed data can be de- 
scribed by a Markov chain with a constant transition probability matrix, 
against the alternate hypothesis that the transition probability matrix is dif- 
ferent in successive quarters. In other words, the null hypothesis is that 
Pij:1 = Pij.2 for 7,7 =1, 2,3, - - - , J. This hypothesis states that the probabilities 
associated with the 7th row in the cross-classification table for quarter 1 (and 
the following quarter) are the same as the probabilities associated with the 
ith row in the table for quarter 2. To test this hypothesis we take the ith row 
in the table for quarter 1 and the 7th row in the table for quarter 2, and form 
the following 2 XJ cross-classification table: 


wis(l) wie(l) wis(1) - + + wir(1) || 


1} wi(2)  wye(2) wy3(2) - + - wir(2) 


For each row 7, a table can be formed, and the hypothesis that pjj1= pij-2 
(j=1, 2, ---, JZ) can be tested by the same x? procedure that is used for a test 
of independence in a 2XJ table. Each x? will have J—1 degrees of freedom. 
The large sample distribution of the x? statistic for row 7 is independent of the 
distribution for row k (t#k). Since the total number of such 2XI/ tables is I 
there will be J independent x? statistics. The sum of these I x? statistics can be 
used to test pij;1 = pi;;2 for alli=1, 2, - - - , J; this sum will have the usual dis- 
tribution with J(J—1) degrees of freedom. 

The method described here to test the hypothesis that pi;,1=pi;;2 (t, 7=1, 
2, ---+, I) can be generalized in a straightforward fashion in order to obtain 
a test of the hypothesis that pij.1=pij;2=DPi;s= - + * =Pi7- Similarly the 
method described earlier herein to test the hypothesis that p1j,1=poj1= - 
=prj:1 (j=1, 2,---, I) can be generalized in order to obtain a test of the 
hypothesis that p1j;:=p2j,2= * + -=prye (J=1, 2,---, J) for t=1,2,---, T. 
We have illustrated how hypotheses concerning Markov chains can be tested 
using the standard technique of computing a x? statistic in the same manner as 
is done for tests of independence in cross-classification tables. This approach 





THE MOVER-STAYER MODEL 863 


has been developed further in Anderson and Goodman [2] and Goodman [9], 
where tests are also presented for hypotheses other than those discussed here. 
For example, a test is presented there for the hypothesis that the code category 
of a worker in a specified quarter is dependent on his code category in the pre- 
vious quarter, against the alternate hypothesis that it is dependent on his 
code category in the previous A quarters (k>2). In other words, this test is 
concerned with determining the extent of the after-effect of a worker’s past 
history. W. Feller [7] has stated that, in the study of labor mobility, the 
“after-effect of past history is probably so pronounced that higher-order 
Markov chains must be introduced;” the techniques presented in [2] and [9] 
could be used to determine, in a certain sense, the magnitude of the after-effect 
of past history and the order of the Markov chain that should be introduced. 

The previous remarks in this section assumed that s;=0 for 7=1, 2,---, J. 
If there are stayers (i.e., if s;>0 for some 2), and if all stayers can be identified, 
then the methods described above can be applied to the data for the movers 
(separating them from the stayers). Denoting by c; the number of workers in 
the ith code category in the initial quarter who remained continuously em- 
ployed in that category for the next n quarters, it is easy to see that as n—0 
the probability that c;=w,s; approaches one (when w; is a fixed constant). 
Thus when 7 is large, the data for the 


c= ic 


i=1 


workers can be separated from the data for the remaining w—c workers and 
the latter data can be thought of as a first approximation to the data for the 
movers. This can be done when n is sufficiently large so that 


f nw; (1—8;) 
Pr jc; = wisi} = (1 — mx) (6.1) 
is close to one. In this case, the following kinds of hypotheses can be tested 
using the general approach described above applied to the data for the w—c 
workers: (a) that the transition probability matrix for movers is constant, 
against the alternate hypothesis that the matrix is different for different 
quarters; (b) that the code category of a mover in a specified quarter does not 
depend upon his category in the previous quarter, against the alternate hypoth- 
esis that it does; (c) that the code category of a mover in a specified quarter 
is dependent on his code category in the previous quarter, against the alternate 
hypothesis that it is dependent on his code category in the previous h quarters 
(h>2); (d) that the transition probability matrix for one group of movers (say 
females, 40-44) is the same as the transition probability matrix for a different 
group of movers (say females, 50-54), against the alternate hypothesis that it 
is not; etc. These tests based on the w—c workers can be justified when n-~, 
and then w,-* (¢=1,2, - - - , J); the order in which limits are taken can not 
be interchanged here. 
If it were known that s;=0 for i=1, 2,---, I, the general approach de- 
scribed above could be applied to the data for all w workers, rather than to 
the data for the w—c workers. It is therefore of interest to be able to test the 





864 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


null hypothesis that s,=0. A simple test of this hypothesis (i.e., the hypothesis 
that the process can be described by the Markov chain model, against the 
alternate hypothesis that the mover-stayer model is necessary) can be based 
upon a comparison of f;=c,;/w; with its expected value, pj, under the null 
hypothesis, where p,; is the probability that a worker (i.e., under the null hypoth- 
esis, a mover) will be in the ith code category in quarter t+1 given that 
he was in that category in quarter ¢ (t=1, 2, - - - ); we estimate p;; by 


r T 
i= > wi) / Dd wi(t). 
t=1 t=1 


The large sample distribution (under the null hypothesis, with wo and with 
n=T'>1a fixed constant) of the statistic (f;— pf) =d; is normal with zero mean 
and variance of the form 


n n 2n—1 
Pi(l— pi) np (1 — pi) 
| Ew) ne 


where 


The variance (6.2) can be estimated by 


n 


n 2n—1 
ii(l — ii ut U ty Di 
Bis(1 — pis) _ mPin (L — ur) (6.3) 


W; W; 


and the statistic d?/S%,= Xj can be used to test the null hypothesis that s;=0; 


the large sample distribution (under the null hypothesis) of the statistic X? will 
be the x? distribution with one degreee of freedom. The statistic 


I 9 

> Xx; 

i=l 
can be used to test the null hypothesis that all s;=0 (¢=1, 2,---, I); the 
large sample distribution will be x? with J degrees of freedom. (Similar tests 
can be obtained for the case where n¥ 7’, but we shall not go into these details 
here.) These tests do not require that n (or 7’) be large. 

We noted earlier herein that when n was sufficiently large, tests were avail- 
able for various hypotheses concerning the mover-stayer model; e.g., the 
hypothesis that the code category of a mover in the second quarter does not 
depend upon his category in the first quarter. We shall now discuss some in- 
direct methods for testing this and other hypotheses when n is not so large. 

Let m,;,; denote the probability that a mover, who is in the 7th code category 
in quarter 1, will be in the jth category in the following quarter. The null hy- 
pothesis is that mj.;.=m2a;1=™M3j.1= + * + mrj;:=my;.1. The relevant data are 
given in the following cross classification table: 





THE MOVER-STAYER MODEL 


wi2(1)  wis(1) + - - war(1) || 
— Wo3(1) ss ik i 


W32(1) 


wn(1) wr(1) wr(1) ++» —— 


The diagonal entries (when i=j) are not given in this table since we do not 
know how many movers are included among those workers who remained in 
the same category in quarter 1 and the following quarter. 

From the results presented in Section 3, we see that, when the null hypoth- 
esis is true, the maximum likelihood estimates of the probabilities m;., (based 
upon the data in the above table) can be obtained from the solution of the 
system of equations 


dmj.1 + mj,1[—A — wj(1) + w;.(1)] + w (1) = 0, (6.4) 


where w;(1) is the sum of the entries in the jth row in the table given above, 
w’(1) is the sum of the entries in the jth column. Methods for solving this sys- 
tem of equations were discussed in Section 3 herein. 

When the values w* (1) are given, the entries in the ith row will have a muli- 
nomial distribution with probabilities m;;,1/(1—m,1) for 747 and sample size 
w(1). Under the null hypothesis, the probabilities will be m;,:/(1—®mi;1), 
for ji, and the expected value of w,;(1) will be E[w,;(1) ] =wf(1)mj,1/(1 — m1) 
(see [3, p. 46]). The maximum likelihood estimate of EZ [w;;(1) ] is 


Ei; = ws (1) m;.1/(1 snes thi:1), 


where ;., are the estimates of m;,, obtained by solving (6.4). The null hypoth- 
esis that m,;.;=m,;,,; for i=1, 2,---, I can now be tested by a procedure 
similar to a standard x? goodness of fit test. When the null hypothesis is true, 
the large sample distribution of 


D> [wi() — Ly]*/ Bi (6.5) 
ij 
will be a x? distribution with J(J —2) —(J—1) = (I—1)?—I =I? —3I +I degrees 
of freedom. Thus, (6.5) can be used to test this null hypothesis. 
The estimates m;., from (6.4), which are used to compute the E;; in (6.5), 
do differ from the 


I 
m3 = w (1) > w (1), 


j=l 


the estimates suggested by the BKM approximation [3, p. 46]. The computa- 
tional formula for the x? statistic given on p. 45 of reference [3] assumes that 
mj. can be replaced by mj,; results presented in Section 3 herein indicate 





866 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


that the difference between m;.,; and m;., can be large. Therefore, this formula 
should be replaced by the method described herein. 

We have now seen that, for a particular hypothesis concerning the mover- 
stayer model, a significance test procedure could be devised that was similar 
to a standard x? goodness of fit test. A similar procedure can be developed for 
other hypotheses concerning this model. For example, let us consider the hypoth- 
esis that the probability that a mover will be in the jth code category in the 
(n+1)th quarter does not depend on his code category in the initial quarter. 
This is the hypothesis that M" approximates M~@ (see Section 3). The relevant 
data are given in the cross-classification table in Section 3, where the 2;; (#7) 
defined in Section 3 are the cell entries. To obtain a test of this null hypothesis, 
the previous comments relating to the w,;(1) data (¢#7) can now be applied to 
the 2;;. 

The methods developed in this section will be appropriate when a large 
sample is studied. If the expected number of observations in some of the cells 
of the cross classification tables discussed above is very small, then we must 
be cautious in our use of the x? statistic. (This is also true for the standard x? 
tests in cross-classification tables.) In this case, perhaps the sample size could 
be increased, or some of the code categories could be combined (if the hypoth- 
esis to be tested remained valid for the combined categories), which would 
increase the expected numbers in the cells of the cross-classification tables. 

In closing this section, it may be worthwhile to mention that certain kinds 
of standard x? tests are not justified for testing hypotheses concerning the 
mover-stayer model nor the Markov chain model. For example, let us consider 
the table and results presented in Sec. 3.3.2 of reference [3]. To illustrate the 
point we wish to make, we shall only consider one of the comparisons men- 
tioned there, which can be displayed in the following 2X2 table: 


TABLE 8: DISTRIBUTION OF CODE CATEGORY 
DESTINATIONS, IN PER CENT 


Female moves into | Female moves not into | Number of 
Code Category G Code Category G | moves 


Age Group 


20-24 30.: 69.8 
60-64 43.6 | £5.4 


A total of 1,507 one-quarter moves were made by females in the 20-24 age 
group, and 30.2 per cent of these moves were into code category G (Trade). 
In general, the distribution of the per cents given in the first row of the above 
table will differ from the distribution of per cents corresponding to the binomial 
distribution with sample size 1,507, since the moves are not 1,507 independent 
observations. (The same females appear several times among the 1,507 ob- 
servations.) Also, the x? statistic computed for this 22 table will not have the 
usual distribution with 1 degree of freedom. Hence, the standard interpretation 
of these kinds of tests is not justified. 





THE MOVER-STAYER MODEL 867 


7. SOME SUPPLEMENTARY REMARKS 


In the statistical analysis presented in Sections 2—4 herein, it was assumed 
that the transition probability matrix for movers was constant during the 
period of investigation, 1947-49. Whether the transition probability matrix is 
actually constant will depend upon the definition of the code categories used, 
the particular population and subpopulations studied, the stability (in a 
certain sense) of economic activity relating to these populations, ete. Perhaps 
chains with non-constant transition probability matrices or chains of higher- 
order should be used (see [2], [9]). (The 1947-49 period was characterized 
by a high and relatively stable level of economic activity, although the “mild 
recession” in 1949 did have some effect on the labor market.) Using some 
of the methods presented in references [2], [3], [9], and herein, various aspects 
of the data can be examined in order to see what, if any, are the kinds of regu- 
larities in the process and what are the kinds of models that can be used to 
fit the observed phenomena. | 

By a careful examination of certain aspects of the data relating to a par- 
ticular model, the irregularities that may be observed might suggest for study 
other aspects of the data and other hypotheses and models. If sufficient atten- 
tion is not paid to the deviations and irregularities from the specific model that 
is used, then there is a danger of misinterpreting the data. In order not to 
misinterpret the data, we must use a model that actually describes the phe- 
nomena under investigation or is a sufficiently close approximation to it. In 
developing a satisfactory model, it is often important to separate the hetero- 
geneous groups in the population into homogeneous subpopulations. For ex- 
ample, the mover-stayer model is an attempt to distinguish between two groups, 
the movers and stayers. In the BKM work [3], there is also the separation of 
the workers by an age and sex classification. Other possible classifications might 
be used to divide the population into more homogeneous subpopulations per- 
mitting possibly a more accurate prediction of behavior. Some of these classi- 
fications are: occupation, geographic region, skill, race, previous work his- 
tories, e.g., length of continuous service in a given code category, time of en- 
gagement in a given category, time of entry or withdrawal from the labor 
force, ete. (see [11]). 

The procedure of subdividing the population with regard to a specific classi- 
fication (e.g., age or sex), and then analyzing the data for the different sub- 
populations in the classification, is useful also in studying the relation be- 
tween the specific classification and the process under study. For example, 
BKM [3] indicate that, for the various age and sex classes, there are clear- 
cut differences in the total amount of movement among code categories, and 
they discuss the differential mobility of workers of different age and sex. One 
could think of other such relevant classifications, and study their relation to 
the movement among code categories. A somewhat different method for divid- 
ing the data into relevant parts can be illustrated by the BKM investigation 
of year and seasonal influences [3]. In this case, the data were divided with 
respect to the year and seasonal classification, thus permitting the simul- 
taneous analysis of mobility by year and by season. 








868 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


One might also be interested in the relation between various socio-economic 
indices and various aspects of the data. For example, the relation between the 
one-quarter transition probability matrices and “level of economic activity” 
might be studied. The period of time studied by BKM [3] was not long enough, 
and the level of economic activity during this period did not vary enough to 
use the one-quarter transition probability matrices in order to study this rela- 
tionship. It may very well be the case that in studying data over a longer and 
less stable period of time, the irregularities in the transition probability 
matrices may be found to be due to other factors; such as, general economic 
conditions, relative economic level of particular industries, technological 
changes leading to the development of new industries, the relative geographic 
movement of industries, income or wage differentials, etc. Some of the methods 
presented by BKM [3] and herein can be used to study these relations. They 
could also be used to study occupational and social mobility and how these 
forms of mobility are related to the data on industrial mobility. 


8. REFERENCES 
[1] Anderson, T. W., “Probability models for analyzing time changes in attitudes,” 
Mathematical Thinking in the Social Sciences, edited by Paul F. Lazarsfeld. Glencoe, 
Illinois: The Free Press, 1954. 
Anderson, T. W. and Goodman, Leo A., “Statistical inference about Markov 
Chains,” Annals of Mathematical Statistics, 28 (1957), 89-110. 
Blumen, Isadore, Kogan, Marvin, and McCarthy, Philip J., The Industrial Mobility 
of Labor as a Probability Process. Volume VI of Cornell Studies of Industrial and 
Labor Relations. The New York State School of Industrial and Labor Relations, 
Cornell University, Ithaca, New York, 1955. 
Bush, Robert R. and Cohen, Bernard P., Book Review of Reference [3], Journal of the 
American Statistical Association, 51 (1956), 702-4. 

[5] Cramér, Harald, Mathematical Methods of Statistics, Princeton University Press, 
Princeton, 1946. 

[6] Feller, William, An Introduction to Probability Theory and Its Applications, Volume 1. 
New York: John Wiley and Sons, 1950. 

[7] Feller, William, Book Review of Reference [3], Psychometrika, 21 (1956), 217. 

[8] Glass, D. V., Editor, Social Mobility in Britain, Routledge and Kegan Paul, London, 
1954. 

[9] Goodman, Leo A., “Some statistical methods for the analysis of certain kinds of social 
processes,” (1956, mimeo). (This article will appear in volume edited by Paul F. 
Lazarsfeld, The Empirical Study of Short-Range Change: The Panel Technique.) 

[10] Kitagawa, Evelyn, Book Review of Reference [3], American Journal of Sociology, 62 
(1956), 233. 

{11} Lane, K. G. and Andrew, J. E., “A method of labor turnover analysis,” Journal of 

the Royal Statistical Society, Series A, 118 (1955), 296-314. 

Mahoney, Thomas A., Book Review of Reference [3], Econometrica, 25 (1957), 

379-81. 

[13] Perlman, Jacob and Mandel, Benjamin, “The continuous work history sample 
under old-age and survivors insurance,” Social Security Bulletin, February, 1944. 

[14] Prais, S. J., “Measuring social mobility,” Journal of the Royal Statistical Society, 
Series A, 118 (1955), 56-66. 


cS 


(12 











FORECASTING INDUSTRIAL PRODUCTION! 


H. O. SrEKLER 
University of California, Berkeley 


This paper analyzes the predictive performance of several forecasting 
techniques: (1) a leading series regression, (2) the independent com- 
ponents of these leading series, and (3) a diffusion index. The compara- 
tive predictive performance in forecasting the turning points of the 
FRB Index of Production was measured by means of lead-reliability 
functions. The use of lead-reliability functions requires rules for identi- 
fying true and false peaks. In applying these rules, it was shown that 
the estimated values of the lead-reliability relationship are sensitive to 
the rule which is used to identify the true turns. A comparison of the 
predictive performance of the three techniques shows that both the 
diffusion index and the independent components are better predictors 
than the regression. 


1. INTRODUCTION 


HE ability of a leading series regression to predict the Federal Reserve 

Board Index of Production has been previously analyzed.? Maher orig- 
inally compared this regression with a diffusion index constructed from the 
same leading series. Alexander and the author compared the forecasting be- 
havior of this regression (LSR) with that of an autoregression of the FRB 
Index alone. The leading series regression was favored in both of the previous 
analyses. 

Two questions which remain unanswered will be discussed in this paper. 
First, is the LSR a better predictor of the turning points of the FRB Index 
than the diffusion index constructed from the components of the FRB Index? 
The ability of this diffusion index to forecast the turning points of the FRB 
Index has already been demonstrated.’ Secondly, since the independent vari- 
ables in the LSR are intercorrelated,‘ it would be appropriate to determine 
whether the independent components of the leading series might not be better 
forecasters than the LSR. 


2. LEADING SERIES REGRESSION VS. DIFFUSION INDEX 


The comparative predictive performance of the LSR and diffusion index of 
the FRB Index of Production will be measured by their respective lead-relia- 
bility functions. The data required to perform this analysis are: (1) the aver- 








: This work was done in part at the M.I.T. Computation Center, Cambridge, Massachusetts, and is contained 
in part within the Author's Ph.D. thesis, Essays in Economic Forecasting, which is deposited in the library of the 
Massachusetts Institute of Technology. The author is grateful for the financial assistance provided by the General 
Electric Charitable and Educational Fund, and indebted to Professor Sidney 8. Alexander for the many helpful 
comments which he provided on portions of an earlier draft of this paper. 

2 John E. Maher, “Forecasting Industrial Production,” Journal of Political Economy, LXV (April, 1957), 
pp. 158-65; Geoffrey H. Moore, “Forecasting Industrial Production—A Comment,” Journal of Political Economy, 
LXVI (February, 1958), pp. 74-83; and S. S. Alexander and H. O. Stekler, “Forecating Industrial Production— 
Leading Series versus Autoregression,” Journal of Political Economy, LX VIi (August, 1959), pp. 402-9. Also see 
S. 8. Alexander, “Rate of Change Approaches to Forecasting—Diffusion Indexes and First Differences,” Economic 
Journal, LXVIII (June, 1958), 288-301. 

48. S. Alexander, op. cit. 

4 Alexander and Stekler, op. cit., p. 402. 








870 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


age lead of the true turns of each predictor over those of the predictand, and 

(2) the ratio of true to total turns associated with every average lead. 

The method of identifying turns, whether they be true or false, has previ- 
ously been discussed.® For this analysis, the smoothing criterion, “number of 
months up or down,” was again used to identify the turning points. The num- 
ber of true turns was subtracted from the total number of turns predicted by 
each criterion, thus yielding the number of false turns associated with each cri- 
terion. For every smoothing criterion, the lead of the true turns is the actual 
lead of the true turns minus one less than the number of months in the “months 
up or down” criterion. 

The identification of the true turns is the crucial element of the required 
data. After a count of these true forecasts, the number of false turns is ob- 
tained. Likewise the average lead of the predictor’s true turning points over 
the turns of the predictand can thus be computed. Despite the importance of 
identifying true turns, no criteria which would enable observers to distinguish 
true from false leads have as yet been devised. 

Contemporaneously, there is no way of knowing whether an observed turn 
is true or false. This classification can only be made after the fact. The purpose 
of devising rules to distinguish historically true from false forecasts is to enable 
us to better analyze the predictive power of forecasting techniques. 

Common sense might be helpful in outlining some of the characteristics that 
distinguish true from false leads. A trough (peak) should obviously not be pre- 
dicted before the predictand’s preceding peak (trough) has occurred. Secondly, 
the predictor would be expected to predict the ensuing peak (trough) only after 
it had forecast the present trough (peak). Finally, the predictor should also not 
“forecast” an event many months after its occurrence. With these observations 
as guides, it is possible to define several rules which are helpful in distinguishing 
between true and false peaks.® 
Rule I.—A true peak is that turning point where the predictor attained the highest 

level after the occurrence of both the predictand’s previous trough and the 
forecast of that trough, provided that the prediction of the peak does not lag 
the event by more than six months. 

Rule II.—A true peak is that turning point where the predictor attained the highest level 
after the occurrence of both the predictand’s previous trough and the forecast 
of that trough by the predictor within the specified period: X months before 
and Y months after the predictand’s peak.” 

Rule III.—A true peak is the predictor’s forecast of a peak which is the closest in timing 
to the predictand’s peak provided that it occurs no more than X months 
before or Y months after the event. There is the additional proviso that it 
must occur after the predictand’s previous trough and the prediction thereof. 


The distinguishing characteristics of these rules should be noted. With Rule 





5 Alexander and Stekler, op. cit., pp. 406-7. It should be noted that the count of the number of turning points it 
independent of the identification of the true turning points. 

* With the appropriate terminological substitutions (i.e. trough for peak; lowest for highest; etc.) these rules 
could also be used for distinguishing between true and false troughs. 

7 One might even argue that a forecast must always come before the actual turning point and that every “pre- 
diction” after the event should be counted aa a false turn. In this case, Y would be equal to zero in both Rules IJ 
and III. On the other hand, there are instances when even a lagged “prediction” might yield some information. 
It was for this reason that a lag of up to six months was accepted. In practice, the prediction invariably occurs 
before the turning point, and this argument therefore becomes academic. 





FORECASTING INDUSTRIAL PRODUCTION 871 


I, if there is at least one turning point between the trough (or the forecast of the 
trough whichever is later) and six months after the peak, a true forecast must, 
by definition, be obtained. There is no true turn only if no turn occurs during 
this period. This rule for identifying true turns thus virtually assures us that we 
will predict every turn of the predictand. 

In the same way that varying the smoothing criterion yielded a measure of 
the trade-off or rate of exchange between length of lead and the number of false 
turns predicted, the variation of the time element in Rule II will provide a 
rate of exchange between true and false leads. As the time span around the 
date of the actual event is decreased, the likelihood that a predictor’s turning 
point will occur during this interval becomes less certain. A turning point, that 
was previously called true (with Rule I), would now be classified as false if it 
failed to fall within the specified time interval. Thus, we could, if we desired, 
show how the number of true leads is a function of the parameters X and Y of 
Rule II. 

The transition from Rule II to Rule III neither increases nor decreases the 
number of true turns which are forecast, provided that the same values of X 
and Y are chosen. The latter rule only provides another means of distinguishing 
between the true and false turns which occur in the desired interval. If no true 
turns were distinguished by Rule II, neither will any be so classified by Rule 
III. However, if more than one turn occurs within the selected time interval, 
Rule II will call one true, while the use of Rule III may claim a different one as 
the true turn. If there is only one turn within the interval, both rules must 
necessarily call it true. Thus it is clear that the number of true turns predicted 
by Rules II and III is identical, but their average lead may vary. When X is 
unspecified and Y equals 6 in Rules II and III, the number of true turning 
points which are identified is identical for all three Rules. However, the dating 
of the turns under Rule III is likely to be different from the dating obtained 
from the other two Rules, and consequently the average lead of true turns is 
usually less with Rule III than with Rules I and II.® 

It is desirable to illustrate how sensitive the lead-reliability relationship of a 
predictor is to the choice of the rule used to identify true turns. The true turns 
of the diffusion index of the FRB Index were identified by all three Rules. 
Alexander had implicitly used Rule III with no restrictions on X or Y to 
identify the turning points of the diffusion index. Rule I and Rule II, with X 
less than 12 and no lags greater than 6, were used as alternative methods of dat- 
ing the true turns. The great sensitivity of the lead-reliability relationship to 
the choice of identification rule is clearly indicated from Table 1. While the reli- 
ability for all three methods is approximately equal for each and every smooth- 





8 As the criterion number of months up or down is varied, it is possible that the dating of the true turn will 
change. For example, the predictor's turn which is closest to the predictand’s peak may only be below peak for seven 
months; if this is the case the criterion “eight months down” would then identify the next closest turn as the true 
peak. Since this peak has by definition a longer lead, the average lead of true turns may show some surprising dis- 
eontinuities as the smoothing criterion is varied. 

* In all fairness to Professor Alexander, it must be pointed out that he recognized the existence of multiple 
turning points. (See Alexander, op. cit., pp. 292-4). However, the dating of the troughs of 1932 and 1952 and the 
peak of 1936 indicates that Rule III with no restriction on X or Y was implicit in his identification. Therefore, 
wherever multiple turns were indicated by Alexander, we assumed that if his assumption about the dates of the 
other turns had been made explicit, he would have used the last of the multiple predictions throughout. 





872 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 
TABLE 1. FREQUENCY OF TRUE AND FALSE PREDICTIONS AND AVERAGE 
LEAD OF THE TURNS OF THE DIFFUSION INDEX OF THE FRB INDEX 
OVER THE TURNS OF THE FRB INDEX IN RELATION TO THE SMOOTHING 
CRITERION AND RULE TO IDENTIFY TRUE TURNS 
Oct. 1922—Dec. 1939, and Jan. 1948—Dec. 1955* 





Rule III 


| ul Rule II: X12, YS6 
wie 48 : “ | (with no restriction on X or Y)* 
Criterion No. iv Ms we chee ee a | 


of Months | Me ee vac liek PS actes 
mths | 5, of Lead | | Av.lead | No. of Lead | Av, - | No. of Lead | 5. Av. lead 
up or down | a | Relia- of : ____| Relia- | Relia- | of 
. a -—— + — a + 0 
True | False | bility true turns@ False | bility? = 


! 
true turns’ | True | False true turns® 


= Te © [ule 
| 
| 


23 | 3 
: | 
m4 
| 
| 


oreweeeanaeccdc il 
a Go Go Go Go Go Go wo eo oe | 























15 | 





® The true peak which forecast the 1948 recession occurred in November 1947; it was counted as occurring then rather than in 
January 1948. With a one-month criterion, if the latter date had been used, the average lead would have been 10.2 months with Rule I. 

> Reliability is defined as the frequency of true to total leads. 

© When the restrictions XY $12, Y $6 are imposed one less true and one more false turn are identified for each of the criteria: 8 
10, and 15 months up or down. The average lead of the true turns for these criteria then hecame 0.1, —1.9, and —6.9 months re- 
spectively. 

4 The average lead of the true turns is equa! to the actual average lead of the turns minus one less than the number of months 
in the number of months up or down criterion (i.e., the lag needed to identify the turning points). 


ing level, the average lead of the true turns shows a great deal of variability 
between the Rules. 

The next question to be posed is: which of the three Rules is the best to apply 
in making a comparison of the power of two forecasting methods? Intuitively, 
one would reject Rule III almost immediately. With this Rule, the dating of 
the true peaks and troughs fluctuates violently depending on which smoothing 
criterion is applied. Thus, using this Rule, both the number of false turns and 
the dates of the true turns fluctuate when different smoothing criteria are 
applied.!® For these reasons it is preferable not to utilize Rule III as a method 
for identifying true turns. 

As between Rules I and II where the latter has some restrictions on the tim- 
ing of true peaks, there is little to choose a priori. The customary method of 
identifying true turning points is based on Rule I (i.e., the highest peak since 
the previous trough). However, for forecasting purposes another aspect of the 
question might appropriately be considered. Suppose that at every one of the 
true peaks identified by Rule I, we would have made a forecast for the follow- 
ing 12 months. Let us assume that we would have predicted: the peak in the 





1° As an ntention, the pooiletian peak closest in timing to the predictant’s peak may only be below peak 
for six months. It would be called a true peak with Rule III if the smoothing criteria up to and including “six months 
up or down” are used. With the use of the “seven months up or down” criterion, this observation is no longer con- 
sidered a peak (whether true or false), and the number of observed peaks is reduced. Moreover, since this peak 
formerly was a true peak, the next nearest peak which is below peak level for seven or more months is now con- 
sidered the true peak; the dating of the true peak has changed and a priori the average lead of the true turns has 
been increased. With the other rules, true peaks or troughs are not as likely to be eliminated as they are in this 
situation, 





FORECASTING INDUSTRIAL PRODUCTION 873 


predictand will occur within the next 12 months. If the lead of our predictor 
had been greater than 12 months, we would have made an incorrect forecast. 
It is for this reason that the peak which is identified in the customary manner 
might appropriately be considered a false lead. Therefore, one of the peaks 
which under Rule I would be considered false might actually be the true peak 
with Rule II. The essence of the problem is that leads which are too long may 
yield incomplete or incorrect information. Individual discretion must deter- 
mine which of the two rules is preferable." 

In the previous analysis” of the lead-reliability relationship of the leading 
series regression (LSR), Rule I was used to identify the true turns. Since Rule 
I was used there, it should also be applied to the diffusion index for the com- 
parison to be consistent. 

The evidence in Table 2 clearly indicates that the diffusion index is superier 
to the LSR. For every lead, the diffusion index displays a higher degree of relia- 
bility. Moreover, for every positive lead when Rule II is applied to the diffu- 
sion index, the results of the diffusion index are superior to those of the LSR 
with Rule I applied.* So far we have not invoked that portion of Rule I which 
states that the forecast of the trough must follow the actual peak in the pre- 
dictand and have counted as true one trough of the LSR which violates this 
rule. 


TABLE 2. RELATIONSHIP BETWEEN LENGTH OF LEAD AND RELIABILITY* 
OF FORECAST FOR DIFFUSION INDEX OF THE FRB INDEX AND LEADING 
SERIES REGRESSION (LSR) IN RELATION TO SMOOTHING 
CRITERION APPLIED. RULE I USED TO IDENTIFY 
TRUE TURNS IN THE PREDICTORS 


Oct. 1922—Dec. 1939 and Jan. 1948—Dec. 1955 

Lead—Reliability 

Criterion —~ <cneM UNC TEREEA hen: ta. cok ees 
(No. of Months | Diffusion Index } 

up or down) ~ — 





LSR 


— 7 ——— “| — ; 
Reliability | Lead | Reliability 


-20 
.30 
.38 
.44 
.50 
54 
.70 


26 10.: 
38 
Al 
44 
47 
50 5 
56 4 
.67 3. 72 
74 ‘: 76 
82 —3. | 


NWWwWwwwwww w 


92 


® Reliability is defined as the ratio of the number of true turns forecast to the totel number of predictions. 


u If the second is chosen, another question is raised: what is the specific time interval around the predictand's 
turn in which a true turn of a predictor can occur? 

2 Alexander and Stekler, op. cit. 

13 For the result compare Table 2 with Table 1. A fortiori the diffusion index with Rule II applied must yield 
superior results than the LSR with Rule II applied, for the application of Rule II as compared to Rule I can never 
increase either the reliability or lead. 





874 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


One peak of the FRB Index of Production was in March, 1927; the ensuing 
trough was in November, 1927; the LSR forecast that trough 10 months prior" 
or in February, 1927, thus violating the Rule. If an adjustment were made for 
this error, the diffusion index would be favored even more, for the average lead 
of the LSR would be reduced since this particular lead was longer than the 
average lead, and its reliability might be reduced, but in no event would it be 
increased.'® The results, therefore, clearly indicate that as a forecaster the dif- 
fusion index is superior to the LSR. 


3. LEADING SERIES REGRESSION VS. INDEPENDENT 
COMPONENTS OF THE LEADING SERIES 


We can now turn to the second question: Are the independent components of 
the leading series better forecasters of the turning points of the FRB Index 
than the LSR? Two steps are required in the analysis. First, the leading series 
must be resolved into their independent components. Then the predictive be- 
havior of the components can be compared with that of the LSR. 

The principal components,'* which are the statistically independent elements 
of the leading series, were obtained for the period 1947-1956. In the process of 
ealculating these independent components, it was possible to determine the 
percentage of the variance of the leading series that each component explained. 
Those components which explained the largest percentage of the variance of the 
leading series were then used to predict the turning points of the RFB Index of 
Production for the same period, 1947—1956."" 

The first and second components respectively account for 55 per cent and 24 
per cent of the variance of the leading series, while the third adds only another 
8 per cent. Since the third and subsequent components each contain a very 
small percentage of the variance of the leading series, the predictive power of 
only the first two components was tested. Chart 1 indicates that the first com- 
ponent is the trend with the cycle imposed upon it while the second component 
is the cycle. 

The magnitude of the coefficients associated with the two components’ indi- 
cates why the cycle is contained in both components. The first component is 








™ Alexander and Stekler, op. cit., p. 404. 

48 It would probably be reduced for the LSR gave no indication of another trough during this period. Thus the 
number of leads would remain constant and the number of true leads would be reduced by one, thereby reducing 
the reliability. 

* For a detailed discussion of the methodology, see Gerhard Tintner, Econometrics, New York: John Wiley 
and Sons, Inc., 1952, pp. 102 ff. and Samuel S. Wilks, Mathematical Statistics, Princeton, New Jersey: Princeton 
University Press, 1943, pp. 252 ff. 

17 The leading series are of course available for earlier periods, but changes of base and the incomparability 
of the data within each series for the period 1619-1956 as a result of these changes necessitated the use of the shorter 
period. 

18 The equation for the first component is 


Zu = 3575 Xi + 4576Xoe + 3963 Xo + 4107 Xe + .3587Xse + .1896 Xe + .3942Xy — 1337 Xsz (1) 


where Z,, is the principal component and the X;’s are the standardized leading series (i.e., zero mean and standard 
deviation equal to one) as follows: Xi, business failures, liabilities; X2 stock prices; Xs, new orders, durable goods; 
X4, residential construction contracts; Xs, hours worked; X7, new incorporations, and Xs, wholesale spot prices. 
Likewise the equation of the second component is 


Zu = — .3608Xiy — .0694X2, + .2456Xay — .0057Xay + .2469Xsy + .5936Xe, — .1365X7y + .6105Xey. (2) 





FORECASTING INDUSTRIAL PRODUCTION 875 


heavily weighted with those series which have both the trend and a cyclical 
movement in common. Hours worked and wholesale prices, which are rela- 
tively stationary series but still contain cyclical variations, have low weights in 
the first component but are heavily weighted in the second. From this it can be 
concluded that the first component best represents the dominant economic 
movement shared by the non-stationary series. That part of the cycle which is 
within the stationary leading series is explained by the second principal com- 
ponent. 

Both the components and the LSR forecast every cyclical turn of the FRB 
Index. However, particular attention might be paid to the first component’s 
prediction of the turns of 1953-1954. The peak was predicted contempo- 
raneously with its occurrence in July, 1953, but the trough of 1954 was pre- 
dicted only one month later. Thus in the strict application of the Rules,’ all 
the turns were predicted, but from an operational point of view, doubt would 
have been cast upon the validity of the predictions. Nevertheless, for further 
analysis, we shall assume that all the FRB’s turns were correctly predicted by 
all the predictors. 

Turning once again to the lead-reliability relationship of the predictors, see 
Table 3, it is apparent that the first component would, for all positive leads, be 
considered superior to the LSR. Except for leads between 1 and —1 months, 
there is very little to distinguish between the results of the LSR and the sec- 


TABLE 3. RELATIONSHIP BETWEEN LENGTH OF LEAD AND RELIABILITY 
OF FORECAST? FOR LEADING SERIES REGRESSION (LSR) AND FIRST TWO 
PRINCIPAL COMPONENTS OF LEADING SERIES IN RELATION 
TO SMOOTHING CRITERION APPLIED (1/47-9/56) 


Rule I used to identify the true turns of the Predictors 


First Second 
Criterion: Principal Component Principal Component 
Months up ~_— —————_$| ———_—_—__—_———— EEE ERE 
| } 
or down | Average Pir | Average eee Average | tla 
. Reliability | 4 Reliability ~ Reliability 
Lead ~ | Lead Lead | 


LSR | 


15 
.30 
.40 
54 


.67 


me ewe OF 


~ 


6 


| 
noe 


8 ! 
9 | 
0 | 

| 





1 
1 
1 
1 


| 
1 & CW 


1] 





-00 1.00 1.0% 


* Reliability is defined as the ratio of the number of true turns forecast to the total number of predictions. 


19 Rule I was used to identify the true turns. 








UIN} O88] RJ O UIN} 88/8] CJ 
UOLIO}IIO SyyUOU ¢ UOLIO}IID syyUOU g 
uinj eniy, 2 ulin} end y, ES 
qyusuodu0g [euoF0YyAG puodreg quouodwoy [euosOYyWO 4Silq — 


*(pepNnypoul JoAod] UBEUT YILM) OCGI-LPGI ‘Sellog FZuipvo'T g oy} Jo yuouodwOD [vedwuug puodveg puB 4sIly JO S[aAV] “| LUVHD 
9S6T GS6T 4S6T €S6T ZG6T TS6T OS6T 6x6T Qr6T Ly6T 


PEPER Fee Cee Pe ee Ye Ee SO ttt Os 4S SOS ee 
! 


—>- ------ - 


~ 


fi —- -—- - - — — - 


Ag ww wee 


_ 
= 
_ 
ion) 
= 
= 
a 
o 
>) 
a 
r 
< 
Z 
om 
~ 
° 
5 
Zz 
° 
& 
< 
©) 
° 
nN 
NM 
< 
= 
< 
ee 
MN 
Z. 
< 
o 
= 


Mee eG OO 28.8 ONS OOO Oe Oeaaes 





FORECASTING INDUSTRIAL PRODUCTION 877 


ond component.”® Thus it can be inferred that either of the components is as 
good a predictor as the LSR.” 


4. CONCLUSIONS 
It is therefore possible to conclude that 


(1) The lead-reliability relationship, while an appropriate technique of 
comparing the predictive power of alternative forecasting methods, is 
very sensitive to the rule used to identify true turns. 

The diffusion index constructed from the components of the FRB 
Index of Production is superior to the LSR in forecasting the turning 
points of the Index. 

The principal components of the leading series are equal to or better 
than the LSR as forecasters of the turning points of the FRB Index. 


20 Ilowever, it must be recognized that the LSR was obtained from a regression of the FRB Index on the leading 
series for the period under consideration, whereas the components were not. fit to the FRB data. 

21 For the period January 1947-September 1956, the diffusion index of the FRB Index of Production outper- 
forms the orthogonal components. With the one month criterion, the diffusion index has an average lead of nine 
months and a reliability of .33, which is supericr to the performance of both of the orthogonal components. For the 
successively larger criteria of “2, 3,4, . . . 8 months up or down,” the diffusion index shows these levels of reliability: 
40, .50, .60, .75, .75, .75, and .75. It is only when a criterion of nine or more months up or down is used that the 
performance of the independent components equals or exceeds that of the diffusion index. 








NON-ADDITIVITY IN TWO-WAY ANALYSIS OF VARIANCE 


JoHN MANDEL 
National Bureau of Standards 


In two-way classification analysis of variance situations there often 
exists a systematic type of row column interaction. A model is proposed 
in which the interaction is of the type Q:7; where Q; is a parameter of 
the ith row, not necessarily associated with the main effect for rows, and 
y; is the main effect for column j. The analysis of data according to this 
model is given, including estimation and tests of significance. The model 
is more general than that involved in Tukey’s “one degree of freedom 
for non-additivity” and includes the latter as a special case. The rela- 
tionship between the two methods is discussed. Applications of the 
method to different types of problems are mentioned and a numerical 
example is included. 


1. THE MODEL 
ONSIDER a set of observations y,;; (See Table I) classified according to two 
criteria, A,(¢=1,--- m) and Bj(j=1,--- n). We will assume that the 
errors of observation for all y;; constitute a sample from a normal population 
with zero mean and variance o*. A general model, including both the “additive” 
and the “non-additive” case, is represented by the following relation: 
t=1,---,m 


Vig = BT Bi TY; + Ty tk ey 1 (1) 
j= oh 


in which e,; is the random error and we assume furthermore: 


Sa- Ey (2a) 
pi me ps 74 = 0. (2b) 


The additive case is characterized by the condition 7;;=0 for all ¢ and j. 

We will adopt the commonly used dot subscript to indicate an arithmetic 
average over the values corresponding to all admissible values of the sub- 
script, e.g.: 


) Yij 
€.9g..-Y¥.5 = a 


Denoting an expected value by the symbol EZ, we derive from (1), (2a), and 
(2b): 


E(yis) = wet pit v5 + ti 
E(y.3) =et+ vi 
thence: 
E(yis —Y.s) = pi + rig. (3) 
Equation (3) provides us with a useful characterization of non-additivity. 


878 





NON-ADDITIVITY IN ANALYSIS OF VARIANCE 
First we observe that in the case of additivity we have 
E(yij — Ys) = pi 


a quantity that is independent of j. In terms of Table I, additivity thus implies 
that for any row, the difference between any element in that row and the 
corresponding column-average is, apart from random error, a constant asso- 
ciated with the row. 

Non-additive situations can now be introduced by making definite assump- 
tions regarding the dependence of E(y;;—y_.,;) on j. One simple and, as we will 
show, useful class of non-additivity situations is obtained by assuming that 
for any given row, the quantity E(y,;—y.,;) is a linear function' of y;: 


E(yis — ys) = ps + Qi. 
This is equivalent to assuming that 
Ti = Qi; 


TABLE I 
B,B.--+ B;-++ B, Average 


Am 
Average 


From (2b) and (4) it follows that 
DY Qi = 0. (5) 
This paper deals with the analysis of data conforming to this assumption. 
More specifically we treat the model 
Yi =p + Pi + Yi + Qi; + €ij (6) 


where 


> = > i = > Q; = 0. 


2. ANALYSIS OF VARIANCE 


In the two-way classification data, let: 


! We will show, in the last section of this paper, that the procedure can be generalized to polynomial functions 
of higher degree. 





AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


7 = ¥.. 

Ri = yi. — ¥.. 
Os = $4 Bix 

Consider the identity: 


yy =I+tR+C; 4+ ( — YC; 4+ Aj; 


Ajj = Yij = Yi.) ‘ious biC;. 


If we now define the quantity b; by: 


> yi; 


it follows that 


and it is easily verified that 


yi > vis 
= mnj- +n> R; +m > C; +> os - 1)’ y C; +> > ns (10) 
i) p] ’ 3 ’ I 


In accordance with this identity, an analysis of variance can formally be 
made as shown in Table II. 

A subsequent section will deal with the validity of tests of significance based 
on Table II. We will first attempt to interpret the analysis in terms of the non- 
additive model of equation (6). 


TABLE II. ANALYSIS OF VARIANCE 





D.F. 


Total 


Mean 1 mn ¥ 
Rows m—1 n>. R: 


© aii 
Columns n-— 1 m > C; 
7 
2 2 
Slopes m— 1 > (& -—1) DG; 
‘ 1 


Residual (m — 1)(n — 2) > > ay 
' 3 











NON-ADDITIVITY IN ANALYSIS OF VARIANCE 881 


3. INTERPRETATION 


Comparing equations (6) and (8) we see that the two terms Q;7; and «,; 
correspond respectively to (b;—1)C; and A,;. This suggests that the parameter 
(); corresponds to the statistic b;—1. Let Q:+1=6;. Then the model of equa- 
tion (6) can be written: 


Yij = pb + Pi + Vi + (6B; = 1)y; + €ij- (11) 
Let ui; =u+ p;. For any given 7, we then have the linear relation: 
Vii = Bi + By; + €i;- (12) 
Thus, the data of Table I represent a bundle of m straight lines, corresponding to 
the m rows, and differing from each other in both parameters yu; and 8;. 
From equation (5) it follows that 


~s% DUM+Q) 


B = 

m m 

If all 8;=1, the model reduces to the simple additive model. We will show in 
the following section that under the hypothesis 6;=1(¢=1, 2,---+, m), the 
sum of squares for “slopes” in Table II is distributed in accordance with 


> (:- 1) XG 2 8 Ment 


where x2,_, is a chi-square variate with (m—1) degrees of freedom. Hence, 
this sum of squares can be used to test. the hypothesis of additivity. 

What the analysis of Table II has accomplished is a breakdown of the 
(m—1)(m—1) degrees of freedom for interaction into two portions: (m—1) 
degrees of freedom corresponding to non-additivity and (m—1)(n—2) degrees 
of freedom for random error. 

4. TESTS OF SIGNIFICANCE 


Consider the observations y;; of Table I and assume that there exist n fixed 
quantities z; such that 


dS 2,=0 and ys = wit Bitsy + ej. 
j=l 


Then, the least squares estimate of 6; is: 


ut Yas 
“A x 


bf = 


and the variance of b,’ is: 
Var (e) 


pee 


d 


Var (b/) = 





882 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


Under the hypothesis: 
H: B;=B2= - + -=Bm an estimate of the variance given by equation (13) is 
obtained by: 


Est. Var. (b/) 


m— 1 


Consequently, under this hypothesis, and for the given set of x;, the quantity: 
“= cit 2 
Lb: -b) Dea; 
t=1 J 


is distributed as o?x? where x? has (m—1) degrees of freedom. 
Consider? the following two sets of linear forms of the y;;: 


{C;} where C 
{Lis} where Lj; 


Then we have 


pp oe A 


It is easily verified that each element of {C,;} is orthogonal to each element 
of {Z,;}. Under the assumptions made for the distribution of the errors €;;, 
the two sets are therefore statistically independent. Consequently, the quan- 
tity o-? >>, >-, 13, which is a function of the L;;, is also statistically independ- 
ent of the C;. Hence its conditional distribution, given the C; is identical with 
its unconditional distribution. Under the hypothesis 8;=62= - - - =Bm, the dis- 
tribution of o-? >> >> L?, is that of x? with m—1 degrees of freedom for any 
fixed set of x; such that >>; 2;=0; we have now proved that this distribution 
is unchanged if we consider the C; fixed and select for the z; the values x;=C;. 
But under those conditions, we have 


e ) z; ‘ =o =. (b; — 1)” + C; 
1 r ] . J 


where }b; is given by (9). We conclude therefore that the quantity 
o > (b;—1)? >>; C? has, under the hypothesis 6:=$2= --- =8m, a chi- 
square distribution with m—1 degrees of freedom and is statistically independ- 
ent of the C;. 

From the identity (10) and the preceding discussion it follows that the 
quantity >>; >>; A}, is distributed according to chi-square with (m—1)(n—2) 
degrees of freedom. Consequently, the analysis in Table II can be used for 
the usual tests of significance. In particular, the hypothesis of additivity, 
8;=1 (¢=1, - - - , m) is tested by comparing the mean squares for “slopes” and 
“residual” by the F-test. 


2 The proof given here is based on a line of reasoning used by Scheffé [3] for a similar problem (pp. 132-3). 








NON-ADDITIVITY IN ANALYSIS OF VARIANCE 883 
5. CONCURRENCE OF THE BUNDLE OF LINES 
In 1949, Tukey [4] proposed a test for non-additivity based on a single 


degree of freedom. In the notation used in the present paper, the sum of squares 
for this single degree of freedom is: 


| Zz p ke; | 
pal > 
i j 


which in view of equation (9) can also be written: 


[=e] 


Pr I, al 
py ~ 


This expression suggests that the single degree of freedom proposed by Tukey 
reflects the linear regression of b; on R;, or, in terms of the population param- 
eters, the linear regression of 8; on p;. 

Let a represent the regression coefficient of 8; on p,;; define 6; by the relation: 


Bi — 1 = ap; + 4; (@¢@=1,---,m). (14) 
Since >°,(8;—-1)=0 and >>; p;=0, it follows that 
> 8; = 0. (15) 


The analysis of variance given in Table II can be extended to include the 
information contained in equation (14). Consider the identity: 


b; — 1 = A&R; + [(b; — 1) — AR]; let d; = (b; — 1) — AR, 


> Rod; 
SR 


[Ea] 


6-1) =— ree + Dd 


then the following identity holds: 


Hence: 


me + 2a rch 





$84 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


The degrees of freedom associated with the two terms on the righthand side 
are 1 and (m—2), respectively. Thus, the source of variation denoted as 
“slopes” is partitioned as follows: 


TABLE III 


D.F. 








Slopes m—1 


Regression of 8 on uw 


eviations from regression 
D t fror g iol 








By a reasoning similar to that of the preceding section one can establish the 
validity of testing, by means of the F distribution, the hypotheses: 


H:a=0 
H: 6;=0 (¢=1,2,---,m). 


For the first hypothesis, the degrees of freedom of F are 1 and m—2; for the 
second hypothesis, they are (m—2) and (m—1)(n—2). 

A geometrical interpretation can be given for the breakdown of interaction 
into three portions with 1, m—2, and (m—1)(n—2) degrees of freedom, respec- 
tively. It has already been mentioned that the model represented by equation 
(6), or its equivalent form, equation (12), is that of a bundle of m straight lines 
differing both in location and slope. Consider now equation (14), which estab- 
lishes a relationship between the slopes 6; and the location parameters y,. If 
all 6; are identically zero, this relationship is strictly linear and becomes a neces- 
sary and sufficient condition for the concurrence of the m straight lines of 
model (12). In fact, the point of concurrence is a point on the main bisector of 
the axes in a plot of E(y,;) versus y;. Its coordinates are given by the expression: 


1 
2 ee (16) 
a 


In the analysis of variance (Table III), the single degree of freedom denoted 
“regression of 8 on u” will reflect the entire row-column interaction if the data 
originate from a bundle of concurrent lines. In that case the mean square 
denoted “deviations from regression” should not significantly exceed that de- 
noted “residual” in Table II. The converse statement is not always true: single 
outlying values may sometimes result in a spurious significance for the mean 
square associated with the single degree of freedom. In order to guard against 
such misinterpretations of the data, a plot should be made of ; (the estimate 
for 8,) versus y; (the estimate for y,). A distinct linear trend in this plot is defi- 
nite indication of a tendency for concurrence of the m lines of the bundle. 








NON-ADDITIVITY IN ANALYSIS OF VARIANCE 885 


Ward and Dick [5] have discussed a non-linear model of the following type: 


Yi =e + Pi + Yi + Opi; + €ij. 


A comparison of this relation with equation (6) shows that the model treated in 
the present paper is more fiexible than Ward and Dick’s. The latter model, like 
Tukey’s, allocates one degree of freedom for non-additivity, whereas in our 
model, m—1 degrees of freedom are available for departures from additivity. 


6. APPLICATIONS 


The analysis of non-additivity presented in the preceding sections is based 
on the assumption that equation (6), subject to conditions (7), applies to the 
data. There are two general types of two-way classification data for which the 
analysis is pertinent. In one of these types, the model is exact. In the second it 
is an empirical representation of the data. 


(1) The exact case 


Let y,; be a function of two classification criteria A; and B; and subject to 
random experimental error. More specifically, let f:(A) and f2(A) be two func- 
tions of A, and f;(B) a function of B such that 


Ys = f(A’ + [fe( Ad] [f2(B) | + es (17) 


where ¢;; is a random error. Then it is possible to analyze the set of data yj; 
in terms of the model of equation (12). 

For example, Van der Waals’ equation for the relationship between the pres- 
sure, volume and temperature of gases is of the form 


a 
(» + <) (V — b) = RT. 
This relation can be written: 


p yt yop. 
which is indeed of the form of (17). 

If p were measured at preselected values of V and 7’, the data could be ana- 
lyzed in strict accordance with the method described in this paper. This 
analysis would not necessitate the actual values of V and 7. However, for a 
study of the applicability of Van der Waals’ equation in its given functional 
form, the functions f;, f2, must then be related to the values of V and the func- 
tion f; to T. For the Van der Waals’ equation, the functions f;, fe and f; cor- 
respond respectively to 

a R 


me ce at 


" ys 


and 7. Furthermore, comparing equations (12) and (17), the functions fi, fo, 





886 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


and f; can be related to the quantities u,, 8;, and 7;.* In this fashion, the prob- 
lem of fitting a surface to the response p in terms of the factors V and T is 
reduced to the fitting of three functions of a single factor each. Of course, the 
parameters a and b occurring in Van der Waals’ equation can be estimated from 
these fits. 


(2) The empirical case 


Consider a response y,;; depending on two factors A; and B;, such as is shown 
in Table I. Let us denote by G; the polygonal line obtained by joining the neigh- 
boring points of the set [y.;, ys;], 7=1, 2, - - - , n; that is, the broken line G; is 
obtained by plotting the values in row 7 versus the corresponding column aver- 
ages and joining the successive points on the plot by straight line segments. 

If we now suppose, for the moment, that the y,; are free of random errors, 
then G,, Ge, -- +, G;, - + +, Ga represent m polygonal lines which will be non- 
parallel or parallel according as an interaction between the factors A and B 
exists or does not exist. If random errors also exist, the parallelism associated 
with non-interaction will be disturbed by the presence of the random errors. 
It is reasonable to attempt to disentangle the effects of small random errors 
from those due to genuine non-parallelism of the polygonal lines. If the condi- 
tions are such that each G; can be approximated by a single straight line, then 
non-parallelism of the G, is shown up by a variance in the slopes of these lines. 
The method of analysis proposed in this paper is essentially a method for sepa- 
rating this slope effect (that is, the effect of a systematic type of interaction 
between A and B) from the disturbances created by random errors. 

A set of two-way classification data for which no specific model can be postu- 
lated prior to analysis, may well conform to an acceptable empirical model 
when it is analyzed in accordance with the proposed method. The further 
breakdown exhibited by Table III provides information regarding the extent 
to which the straight lines approximating the G; may be considered to concur 
in a single point. It should be noted that a bundle of concurrent straight lines is 
nothing other than the geometric representation of a multiplicative model. If 
the point of concurrence lies at the origin of the coordinate axes, the model is of 
the simple form E(y,;;)=A,B;. This model has been treated by Fisher and 
MacKenzie [1], as a problem in estimation, by an iterative least squares pro- 
cedure. 

If the point of concurrence is not at the origin, the model is of the form 
E(yis—0) = K(Ai—vYo)(Bj—‘Y0). This is readily shown by introducing equa- 
tion (14) into (11) and making 6;=0. Then, using the relation yp>=y—1/a, 
equation (11) can be written: 


(ps + w — Yo) (¥i + H — Yo) 
NE DN lB AR a Ih A sis 
w— Yo 


Summarizing, we may state the following conclusions: 
If the analysis of Table II yields a mean square for “slopes” that is consider- 
ably larger than that for “residual,” the data may be represented by a bundle 





* A “normalization” is required, since >: 8; =l and 2; yj =0. This is easily accomplished by a simple algebraic 
rearrangement of (17). 





NON-ADDITIVITY IN ANALYSIS OF VARIANCE 887 


of non-parallel lines, the scatter about the lines being measured by the “residual” 
mean square. If, furthermore, the analysis of Table III shows that the magni- 
tude of the mean square for slopes was due predominantly to concurrence, as 
confirmed by a plot of b; versus y;,, and a mean square for “deviations from 
regression” not significantly larger than that for the “residual” mean square 
of Table II, then the data may be represented by a bundle of concurrent lines 
and conform to an essentially multiplicative model, of the type of equation 
(17). Thus, the proposed analysis is useful for discovering an empirical model 
that expresses the structure of the data. 


7. A NUMERICAL EXAMPLE 


Table IV shows the data obtained in an interlaboratory study of stress 
measurements on natural rubber vulcanizates. Eleven laboratcries determined 
the stress value of each of seven rubber samples. The data listed in the table are 
the averages of four independent measurements.‘ 


TABLE IV. STRESS IN Kg/cm* AT 100% ELONGATION FOR 
NATURAL RUBBER VULCANIZATES 


Rubber 
Laboratory if See 
E 
4.48 
4.16 


| 
| 
| 


| > 
| 


ce 
—_ 
be | 
bo 


ns 
aocgnan | 


or or cr hm 
S 


oro 
ooo or 


4 
4 
4 
4.44 
4.27 
4 
4 
4 
4 


occ 


The data are analyzed assuming the model represented by equation (12), in 
which y,; represents the measured stress value average for laboratory 7 on rub- 
ber j. The analysis is carried out in terms of equation (8), each slope b; being 
calculated in accordance with equation (9). Table V exhibits the analysis of 
variance. The mean square for slopes is highly significant with respect to that 
denoted “residual.” Thus the laboratory-rubber interaction is largely due to 
differences in slope. It is therefore unsatisfactory to express laboratory differ- 
ences for this test in terms of “laboratory biases” only, since the systematic 
error of a laboratory has been shown to depend on the magnitude of the stress 
value. Furthermore, the non-additivity test proposed by Tukey would, in this 
vase, not reveal the total amount of non-additivity. Indeed, this test would 
extract from the interaction the portion labeled “concurrence.” The remaining 





4 Each rubber was compounded and cured in each laboratory on each of four days. Thus, the four replicates 
represented four independently vulcanized samples. 





AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


TABLE V. ANALYSIS OF VARIANCE FOR DATA OF TABLE IV 


Source F. 5.8. | M.S. 


Laboratories 1.4471 4447 
Rubbers ; 209.15 .86 
Laboratories X Rubbers 3.2991 .0550 
tesidual 1.3054 .0261 
Slopes 1.9937 1994 
Concurrence 0.0496 .0496 
Non-concurrence ¢ 1.9441 .2160 


part of the interaction would then yield a mean square based on 59 degrees of 
freedom, obtained by pooling “residual” and “non-concurrence,” and equal to 
0.0551. This value is considerably larger than the residual variance of 0.0261, 
given in Table V, and therefore constitutes an overestimation of the error vari- 
ance of the test. Further aspects of analyses of the type described for the inter- 
laboratory evaluation of test methods are discussed elsewhere [2]. 


8. A GENERALIZED MODEL 


It is easy to generalize the proposed procedure in the following manner. 
tather than considering the elements in each row of Table I as linearly related 
to the column averages, they can be assumed to depend on these averages in 
accordance with polynomials of any desired degree, provided that this degree is 
less than n. One begins by substituting for the column averages, fixed known 
values 2, 22, - * * , , associated with the columns. The procedure then reduces 
to the well-known device of partitioning the sum of squares of columns as well 
as of row-column interaction into parts corresponding to linear, quadratic, - - - , 
components of the column effects. It is necessary to select the successive poly- 
nomials in such a way that they are mutually orthogonal, a result easily ac- 
complished for any given set {x;}. Then, by a line of reasoning entirely analo- 
gous to that used for justifying the procedure in the linear case, one proves that 
substitution of the column averages for the fixed 2; does not invalidate the pro- 
cedure. 

REFERENCES 

[1] Fisher, R. A. and MacKenzie, W. A., “Studies in Crop Variation. Il. The Manurial 

Response of Different Potato Varieties,” Journal of Agricultural Science, 13 (1923), 

$11. 

Mandel, J. and Lashof, T. W., “The Interlaboratory Evaluation of Testing Methods,” 

ASTM Bulletin, No. 239 (July, 1959), p. 53. 

Scheffé, H., The Analysis of Variance. New York: John Wiley and Sons, Inc., 1959. 

Tukey, J. W., “One Degree of Freedom for Non-Additivity,” Biometrics, 5 (1949), 232. 

Ward, G. C, and Dick, I. D., “Non-Additivity in Randomized Block Designs and Bal- 

anced Incomplete Block Designs,” New Zealand Journal of Science and Technology 

33 (1952), 430. 





ON COMPARING INTENSITIES OF ASSOCIATION BETWEEN TWO 
BINARY CHARACTERISTICS IN TWO DIFFERENT POPULATIONS* 


AGNES BERGER 
Columbia University 
Asymptotically favorable methods for testing the equality of certain 
measures of association in different four-fold tables are deduced from 
the general theory of x*-tests as given by Neyman and generalized by 
Anderson, Goodman and Gold. The tests turn out to be also obtainable 
by an intuitive use of maximum likelihood estimates as suggested by 
Wald’s large sample theory and include one proposed by Lancaster for 
testing the second order interaction in a complex contingency table. 


Applications in the field of drug research and health surveys are indi- 
cated. 


1. INTRODUCTION 


ET x and y represent two binary characteristics distributed in a population 
L a,. Denote their possible values by 1 and 0 and let 


P, = P\y = 1} = Probability that y = 1, 
P,=1-—P,, 
p, = Pix = 1|y = 1} = Probability that z = 1, given y = 1, 
p2 = Pix = 1| y =O}, 
and 
Dis = P\x = 1} = Pyp, + Popr. 


Whenever pi + pz, the variables z and y are called dependent or “associated.” 

Many indices have been proposed to describe the “degree of dependence” 
or “strength of association” between z and y. Among these indices we shall be 
concerned with the following ones: 


11 = p1/Pr, 


0: 


pi(l — p2) — po(l — pi) 
pill = P2) + pol ~ P:) 





yi = 


Pi — Po 
= — 


Mix I= Pu) 
Pi(1 — P,) 


Here 0, is known as the odds ratio, ¥: is Yule’s coefficient of association (see 
1 





* Research supported by the United States Air Force School of Aviation Medicine, Brooks Air Force Base, 
Texas, under Contract No. AF 41(657)-214. 


889 





890 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


reference [19]), and ¢; is the phi coefficient. There is ample literature (see 

reference [7] for bibliography) on how to interpret the different values as- 

sumed by these coefficients and how to test hypotheses about their values. 
Suppose we are given a second population 7; whose members also possess 


the characteristics x and y but in which 
= Pa; 
Ps; 
Ply =1} 
Py=1- Ps, 
P\x = 1} = pas = Popa + Pops 


We may form the analogous indices 72, 02, ~2, ¢2, and consider the following 
hypotheses: 


Hi: t; = te, 


H:: 0; = Oo, 
HT: V1 _ Y2, 
H 4: $1 = $2. 


Since y is a simple function of the odds ratio, the hypothesis H; is seen to be 
identical with H» and need not be considered separately. 

We shall consider the problem of testing these hypotheses under two different 
methods of collecting observations, called Model I and Model II sampling. To 
describe Model I let 21; be the subpopulation of 7, in which y=1, mi» the 
subpopulation of 7; in which y=0, x2 the subpopulation of +2 in which y=1, 
and ma the subpopulation of r. in which y=0. Under Model I we assume that 
a random sample is taken from each of the subpopulations independently from 
each other, and that the samples from 11, 410, 421, M20 Contain N, Ne, N3, N4 in- 
dividuals, respectively. The observations consist in establishing the value of x 
for each individual in the sample. 

Among the many examples in which this model is appropriate we mention 
the following: Let +; be a population of patients all of whom have a certain 
symptom A. Consider that some specified treatment T is applied to the mem- 
bers of a randomly selected subpopulation 7; of 7, while the remainder mio 
receives no treatment. Let p; be the proportion of patients who spontaneously 
recovered from A by the end of some specified time intervai in the group 710 
given no treatment and let pe be the proportion of patients free of A by the 
end of the same period in the group 71: in which the patients are subjected to 
T. Assume that no deaths occur within the period and that the presence or 
absence of A can be definitely established for every patient. If p21, we shall 
say that T has an effect (good or bad) and call the factors T and A “associated.” 
Once it is decided which of the indices available is a suitable measure of the 
effectiveness of 7, one may be interested to establish the differential effect of T 
when applied to different populations: Is T equally effective for males and 
females; for incipient and advanced cases; or, when given to the same category 





ASSOCIATION BETWEEN BINARY CHARACTERISTICS 891 


of patients, is there a difference between two different versions of the treatment, 
say between two different dosages in case of a drug, etc.? 

In studies of this kind, the numbers to be treated and the numbers used as 
controls would usually be selected by the investigator and even if the total 
number that can be studied from 7; and also from 72 is imposed by circum- 
stances beyond the control of the investigator, he can as a rule decide what 
proportion of these samples should be assigned to the treated and to the 
control group. 

Model II differs from Model I only to the extent that in Model II the sizes of 
the samples selected from the subpopulations are random variables, even if the 
total sample sizes from 7; and w2 are predetermined. A more precise description 
will be given in Section 4. 

An example of a problem in which Model II sampling may be appropriate is 
that of comparing the relationship between mental illness and physical handi- 
caps for male and female populations, a question raised in reference [9].! 

So far as it is known to this writer, only tests for H, have received attention 
in the literature. In an early paper, Bartlett [2], following a suggestion of 
Fisher [4], considers the problem of testing H» as one of “testing of the second 
order interaction” between three classifications in a 2X2 X2 table. Bartlett’s 
work sparked some interesting remarks by Simpson [16] and was further ex- 
tended by Norton [13]. Lancaster [8] reconsidered Bartlett’s problem and, 
using his method of partitioning x’, proposed an “asymptotically equal” 
solution to it. 

In this paper we turn to the question of finding asymptotically favorable 
tests for the three hypotheses mentioned. 

A possible approach would be to invoke the theorems of Wald [17] on the 
asymptotic properties of tests based on maximum likelihood estimates and thus 
justify the use of certain intuitively appealing statistics, Lancaster’s among 
them, namely, the difference of the two sample analogues of the measure of 
association divided by an estimate of its asymptotic standard error. Alterna- 
tively, one may use a theorem of Neyman [10] to obtain tests asymptotically 
equivalent to the corresponding likelihood ratio tests. It is shown in this paper 
that for the hypotheses H, and Hz, Wald’s test is identical with Neyman’s 
x*-test (“reduced” x?) for the case of Model I sampling, while for Model II 
sampling, Wald’s test is identical with another x?-test due to Anderson and 
Goodman [5] and Gold [6] for testing H, as well as H; and Hy. 

Similar work specializing Neyman’s results for binomial experiments has 
been carried out by Reiersgl [14], but for a different class of hypotheses. 


Comment added in proof. A referee, whose comments were transmitted to the 
author unfortunately only after this paper was already at the printer’s, has 
called attention to related recent work, most of it by members of the North 
Carolina Institute of Statistics. A list of these papers, compiled by the referee, 
has now been included as a Supplement to References. The present approach 
of applying the results of Neyman and Wald is quite similar to the one em- 
ployed by Bhapkar [20] in the latest of these studies, published while this 
paper was in the referee’s hands. 





1 I want to thank Dr. Gruenberg for mentioning this study to me before it was published. 





AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 
2. NEYMAN’S RESULT 
Consider s discrete random variables, the ith capable of assuming »; distinct 
values and let the probability of the jth value of the 7th variable be p,;;. Assume 
that the p;;’s are unknown, but that it is given that 
Pij = fis(01, Ao, PT hs 6,), (1) 


where the f;;’s are specified functions of the real variables 6, 02, - - -, 0,. It is 
assumed that the f;;’s map the range of variation 2 of @=(0,, 02, - - - , 8,) into 
an open set and that over 2 the f;;’s satisfy the following conditions: 


DIAM  fwllasicgep Fwd Qs ivag 


ii) Sf) =1, i=1,--- 
j=l 


(iii) each f;; possesses continuous partial derivatives at least up to the 
second order, 
(iv) the 6;’s are functionally independent, i.e., the Jacobian 
| af | 
06). 
is of rank r. 
Let p= 1p, sy Prete), °'* * 5 Pele Mi * < ° » Bote, »} be a vector of 


> @-— 1) =v 
i=l 


components and note that the assumptions made are equivalent to a set 
¥(Q) of (v—r) restrictions on p: 


(2): Fi(p) = 0, t=1,2,---,@w@—rv,v—r20. 
Let H be a hypothesis ascribing some specific values to some f<r of the 


parameters: 


H: 0, = O, k=i1,2,---,f. 
These relations are equivalent to f additional restrictions on p, say 
F.-144(p) = 0, k=1,2,---,f, 
so that if H is true, p satisfies a set §(H) of y—r-+f restrictions, namely 
¥(H): Fp) = 0, t=1,2,---,»y—r4+f (4) 


and the assumptions imply that the F,’s possess continuous partials at least 
up to the second order. We shall assume from here on that the restrictions 
F(p)=0 can be explicitly obtained, i.e., the elimination of the parameters 6 
from the set p;;=fi;(0) can be actually carried out. This is not necessarily the 
case in general, even though the conditions guarantee the existence and regu- 
larity of the solutions. 





ASSOCIATION BETWEEN BINARY CHARACTERISTICS 893 


Assume now that it is desired to test H and to this end we select a set n of s 
positive integers, 


nN = (M1, Nz, °° *, Ne), 


to serve as sample sizes and perform, independently for each 7, n; independent 
trials on the 7th variable, i=1, 2, - - - , s. In the course of the 


® 
YS ni 
i=] 


trials, let n;; be the frequency and n,;/n,= p;; the relative frequency of the oc- 
currence of the jth value for the zth variable; further let 
p _ | pir, oe pi.» a? 52 Ps, pee, oy Pe.vs : (5) 

Define F,*(~, p) as the linear part of the Taylor expansion of F,(p) around the 
point p= p: 

+, “y ! oF ,(p) 

F,(b, p) = Fld) + DY DY ———| (pis — bu). (6) 

? 


i=] j=l Pij 


Let 5*(Q) stand for the system 
F'(b, p) = 0, t{=1,2,---,w=—rPr 
and let $*(H) stand for the system 


Fi (p, p) = 0, t=1,2,---,v—rt+f. 


Define further x,°?(Q) as the minimum value attained by the expression 


a ao — P 
m= oud eee Pe (9) 
t=] j=l ij 


where the n; and the #;; are observed and the p,;’s vary under the restrictions 
$*(Q). Also, let x,2(H) be the minimum value of x,? when the variation of the 
pis is restricted by $*(#). (It is assumed here that none of the #;;=0; because 
of (2), this holds with probability approaching 1 as n—~.) Let x.” be the 
upper a-percentile of the x? distribution with f degrees of freedom. Let 


T.(H) = xn(H) — xa(2) (10) 


and consider the following test procedure, to be called “the x*-test” for the rest 
of this paper: Reject H whenever 


T.(H) > xa, 


otherwise accept H. Neyman’s theorem compares the asymptotic behavior of 
this test with tha+ of the likelihood ratio test (A-test for short). Let \,,(H) be the 
likelihood ratio statistic for testing H based on the system of samples n and 
let A,(H) = -2 log A,(H). 





894 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 
Theorem of Neyman [10]. Let 0<7,<1 be s rational numbers with 
1 = 1. 
i=1 
Consider an infinite sequence 


Y { (y) (y) 
m= )N1 , Nhe 


of sets of sample sizes such that 


x (y) 
> n; —o as yoo 
t=] 


(y) 
nN; 


: (y) 
> ni 
i=1 
Let 7',7(H) and A,7(H) denote the values of 7T,,(H) and A,(H) for n=n’. 
Then, as yj 


a) if H is true, the distribution of 7,,7(H) tends to that of a central x? with 
f degrees of freedom, 


b) Pu} 7.7 > Xen Any < Xa} and Pu .* < Le, A,’ > xa} both tend to zero 
for every 0<a<l, 


c) Pa{T.(H) > xe} and Px{A,7(H) >xa} both tend to 1. 


Here Pg means the probability of the relation in braces that follow under the 
assumption that H is true and Pg means the probability of the same relation 
when H is false. The theorem can be summarized by saying that when the 
total sample size increased without limit but the proportion in which each 
variable is sampled is held constant, then the x?-test and the A-test are both 
consistent and the probability that they contradict each other tends to 0 if H 
is true. 

Corollary: If the number of independent parameters r equals v, the number 
of components of p, then the sequence of tests that reject H if and only if 


ltl an 


is in the limit as y—~ equivalent to the likelihood ratio test. If the range of 
{ f:;(8) } is the whole interval 0<p<1, then 


x.(2) =0 and T,(H) = x,(H). 


The proof of this corollary is given in an appendix. 





ASSOCIATION BETWEEN BINARY CHARACTERISTICS 895 


We shall calculate now the value of the statistic x,?(H) for the special case 
of s binomial trials when it is assumed that 


v= >) w—-le=r 
i=] 


and we want to test H:6,=0. In this case, 
vi 
y 
We may write 
= pi = fi(O1, 02, - - + , 6), 
et Sook 
(Pi, Pa, * * * , Ds). 


= pi, 
=1— p, 
bh = (pi, po, > ++, ps). 


The x? expression to be minimized becomes 


2 : (pis — Dis) 
Xn = >, DL %- Pr es 


i=1 jel 2 i] 
2 


. (be — pi) 
_ n;- . ma 7 
u Pil ies a pi) 


As previously, assume that the equations 
Di = fi(O1, 02, -- +, 4), 
can be explicitly solved for @;, and put 
1=F(p), 6 = F() 
and 


s. OF | 
7, = FO +E S| Ob 
‘ . 6; , 
a 6 gm pp, (1s 


t=1 Op; 
To get x.’(H), we need to substitute into x,? the solutions of the following 
system of equations 
2 a 
OXn oF (p, p) 
— + p ———- = 0 +=1,2,---,8, 


an; ‘ Op; , 
F’(p, p) =0 








896 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 
where p is a Lagrange multiplier. The first s equations give 

2ni(pi — pi) 06 

os 

pi(1 — pi) Opi 


or 
l 00 b «(1 as bi) 
Pi — Pi p = r ~e t= 1, 2, oe (16) 
2 Op; nN; 


Substituting into F*(p, p) =0, we get 


> 


= ; 17 
, 06; b Og) 
2 ; (me 


Introducing this into the expressions for p;— i, we get 


“é : ‘ ab, pil — pi) 
2 06, pill — py 
oe l 06; \2 p(1 — p,) api nj (18) 
2 a ca n; 
It follows that 
xn(H) ats (19) 


. 06,;\2 (1 — A,) 
¢( Sea 
i... 
fan} Op; nN; 
The denominator of this expression is an estimate of the asymptctic variance 
of 6,. This can be seen by expanding 6; around its true value in a Taylor series 
. 06, 


6, =O, + (b—p)t---. 20 
a op, pi) + (20) 


The asymptotic variance of 6; (e.g., see Neyman [10]) is given by 
2 . 06; 2p(1 — pi) 
i, = ¥ (- *) : (21) 
i=1 Op; n; 
Replacing p; by J; we get 
2 . 06, 2 bi(1 _ pi) 
4-5 (S) A. (22) 
t=1 Op; nj 
3. APPLICATION TO MEASURES OF ASSOCIATION UNDER MODEL I SAMPLING 


For the first two measures introduced the application is immediate. Under 
Model I we have n; independent observations from the ith of four binomial 








ASSOCIATION BETWEEN BINARY CHARACTERISTICS 897 


populations with probability of success p;, 0<p;<1, i=1, 2, 3, 4. The first two 
hypotheses set up in the introduction can be written as 


H;: G;(p) = 0, j = 1,2, (23) 
where 
: P1 Ps 
G'\(p) => —_ ’ (24) 
Pe ps 


) Jo D; v] 
re ae / EE. -/ adit 5. (25) 
1 — Pi | — Pe2 1 — P3 1 — Ps 


To test H;, for fixed j, put 0,=G;, 0:=pi:, +=2, 3, 4; solving these for p; gives 
pi = f.(0), ¢ = 1, 2, 3, 4, (26) 


with the f;(@) satisfying the conditions of Neyman’s theorem. Thus the test 


‘ G(p))” 
xn(H;) = ee = = —__————-_ > Xe (27) 
> (Sy pil — pi) 
i=l Op; n; 


is asymptotically equivalent to the likelihood ratio test for testing H;, j=1, 2. 
Here x2 is the (l—a)th percentile of the chi-square distribution with one de- 


gree of freedom. 
Gre 
pos 


Thus we have 





ll) = Oy Rab) haa Raa ™ 
a . 1 - i 2 PI hn , 3 > Seeerectl 
Ni p2 Nop N3 pi naps 
and 
4 
xn (H2) 
(as —p:) pl - iy 
L—pi)p2 (1— ps)p 
Pe.) eo ee" 
Pill — fo)” RU fe) PL — Ph, A = Pd 
n(1 = p.)* pi no(1 - pi)? pi n3(1 sn bs)* pi na(1 — ps)*fi 
For later use we calculate A,(H) for Model I: 
4 
max JJ (p,'(1 — p)”* ") 
(A) = —— = (30) 


max [J (p;'(1 — p” 


Pye tml 





898 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


where a; is the number of the n,; observations for which x=1, 1=1, 2, 3, 4. 

The last listed measure of association, the @ coefficient depends also on the 
parameters P, and P;. If these are assumed known, the previous argument 
also applies to testing H,. Of special interest is the case when the sample sizes 
n; are chosen so as to satisfy 


ny N3 
= P,; -~ = P,, 
Ni + Ne Ng + M4 
For this case, let 
db: = (pi — 2) (mime)"/*(nips + nope)-"/?[ni(L — pr) + na(1 — pe)]-/2, (31) 
db: = (ps — pu) (ngng)"/*(nshs + naps) !/2[na(1 — ps) + na(l — pa)|-/2. (32) 
The chi-square test for testing 
Hy: 6; = 7 — 3 = 0 (33) 


consists of rejecting H, whenever 


9 


(pi — $2) 2 : 
Tee —— > Xe. (34) 
> | - =| pil — pi) 
i=! Op; “ ni 
4. TESTING MEASURES OF ASSOCIATION UNDER MODEL II SAMPLING 


We are interested in studying the association between two dichotomous 
variables z and y in each of two populations 7, and m2. We assume that the 
probability structure in the two populations can be represented as follows: 


in wm In x2 
Ply =1} =P, Ply = 1} =P,, 
Ply =0} = P, = (1—P)), Ply = 0} = Py =1-—P;, 
P\z=1|y=1} =n, Piz =1\/y=1} = ps, 
Piz =1\y=0} =p. Pir =1|y=0} = pa. 


We assume that 0<P;<1, P; rational. We assume also that 0<p;<1, i=1, 2, 
3, 4. All these numbers are unknown. 

Suppose we take N, observations from 7, and N; observations from 72 on 
the pair (z, y). Let »; denote the number of observations from 7; for which 
y=1; of these, let a; denote the number for which r=1. Let v2 denote the 
number of observations from 7, for which y=0, »1;+v2=N,, and let az denote 
the number of these for which z=1. Similarly let »; denote the number of 
observations from rz for which y=1, a3 the number of these for which z=1; 
v, the number of observtions from m2 for which y=0, v3+»,=Ne, and a, the 
number of these for which z=1. We write 


a; - 
yes lines 1, 2, 3,4, (35) 


Vi 





ASSOCIATION BETWEEN BINARY CHARACTERISTICS 899 


with 


P = (D1, Pa, Pa, Pa); (36) 
v = (v4, v2, v3, ¥4), (37) 
a = (a4, de, a3, as), (38) 
b = (br, bs, ps,,p.). (39) 


Let H be any of the hypotheses H,, H2. The likelihood ratio statistic for testing 
H is 
: , a via, 
max [] Pip, (1 — p)* * 


P;,Py i=l 


Mil = 





; (40) 
max || Pip; (1—p)° 
P;,p€H imi 

Since the likelihood ratio functions factor into functions containing one 


parameter at a time, and H does not involve the P,’s, the maximum likelihood 
estimates of the P; are identical whether H holds or not. It follows that 


4 
max || pial— os 
AT = — ie —_—_ =),(H) (41) 
max [[p;(l—p) 
pitH jm 


where \,(H) is the likelihood ratio statistic for testing H under Model I for 
n=». 
Corollary. If for y=1, 2,---+, the sequence of sampling sizes Nj, 
NjJ-—« while 
M 
—-——— = constant for all y, 
Ni+ Nj 


then the sequence of tests 


—2 log \,~(H) > Xe 
is consistent. 

This follows from the fact (see Neyman [10]) that the sequence of tests 
—2 log AJ} > 2 is consistent. 

Let us again consider a sequence of sample sizes N7, NZ satisfying the con- 
ditions in the corollary above, and let us for each of the observations (»™, 
a™), calculate x20)(H) as if »v™ were given in advance as in Model I. 

Since the asymptotic conditional distribution of x3(H) given v is the same 
as the asymptotic conditional distribution of M1! (H) given v, and since the 
over-all asymptotic distribution of x*y,.~, (H) is the same as the asymp- 
totic distribution of x*y, y, (H), the x? statistic based on 2 independent quatri- 
nomials, the question arises whether the likelihood ratio test x! (H) is asymp- 
totically equivalent to the test T(H) that rejects H whenever 





AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


x»(H) > xa 


The answer is in the affirmative and is implied by a theorem of Gold [6], 
stated below. As it will be seen, the theorem to be quoted implies the asymp- 
totic equivalence of the T(H) test with the " test also for the hypothesis H,. 


5. THEOREM OF GOLD {6} 


The theorem refers to a Markov chain with a finite set of states 
S={1, 2,---, m} and parameter values t=0, 1, 2,---, 7, usually called 
observation times. For any given value of ¢ the set of possible states is not 
necessarily the whole set S but possibly only a proper subset of S. Let S(t) 
denote the set of states possible at ¢. For 1G S(t—1), we define 


pis(t) = Pj at t| tatt— 1}. (42) 


Further, for iG S(t—1), let Si(t) = {j| pi(t)>0}. Assume that the values of 
pis(t), tES(t—1), 7ES,(t) are unknown, but that we are given a set of functions 
o%,(t), 7€S(t—1) defined over some range of variation 2 of the vector param- 
eter, 0= (01, 02, - - - , 0,). It is known that in the range of variation of 8, 


pis(t) = $is(t), iE 8G = 1), ESM, t= 1,2,---,T, 
> ¢ij(t) = |, (43) 


where ¢,;(¢) has continuous second order partials with respect to each 6, and 
that for every 6’ in the range of @, at least one of the determinants of order r 
say 


a , 4 
| Adays, (ti) Oa;8, (ti) 
| 80, 30, 


| 6 0 

| Oba,s,(t,) Oba,8, (tr) 

06, 00, 
in some neighborhood of 6’. 


Let us for each ¢ and 7C S(t) select one state in S,(t) and exclude it from 
S,(t), denoting the set of the remaining states by S/(t). Put 


pu = {pu(t), iE St — 1), FESO, tE T}. (45) 


Then if vy stands for the dimensionality of pis, the assumptions made can again 
be summarized by v—r restrictions on py of the form 


ar (Q2): Fy, (pa) SE 0, As gts “ie (46) 


and if Hy is a hypothesis assigning some specific values to some fu <r of the 
parameters, these can again be summarized by additional restrictions on py, 
so that if H is true, py satisfies 





ASSOCIATION BETWEEN BINARY CHARACTERISTICS 901 
Fu(H): Fy(pm) — 0, §.= l, 2, ~e © , iu eet J + UM; (47) 


implying the same regularity conditions on the F as before and again we shall 
assume that the system Fy can be explicitly obtained. To obtain a test statistic 
for testing H, assume that we shall observe N individuals and that from these 
there are n;(¢) in state 7 at time t, iG S(t). Assume that 


n,(t) 


m(t = Kh 
(t) V 


(48) 
that n,,;(¢) is the number of those who were in i at (t—1) and are in j at t, 
7CS(t—1). Let n,(0)/N =p,(0), 7G S(O), and for n,(t—1) £0 let 

nj;(t) 


pis(t) = er iE S(t—1), jE St), t=1,---,T, (49) 
nb — 


bu = (pis(t), i=S(t-—1), 7ES/(@}. (50) 
We remark (see Anderson and Goodman [1]) that for the indices exhibited 


lim pe(n(t — 1) #0) = 1, 


ae 


the measure of the set on which a #;;(t) is not defined decreases to 0 with 
increasing n. 

Let F%,(5, p) the linear part of the Taylor expansion of Fy,,(p) around the 
point pw=pw, and let F¥(Q) be the system Fy(pu, pu) =0, t=1, 2,---, 
vu—r and $y(H) the system F¥,(hu, pu) =0, t=1, 2,---, vw—r+f. Form 
now the quadratic 


T j si j2 
god ee eee [bul — pao P 


t—1 1€S(t—-1) JES, (t) pis(t) 


. [pist) — pis(d)]? 
= X nilt = 1) i pull) 


where 


b My 

at) 
denotes the summation over the range of indices explicitly shown above and 
where it is assumed that no #;,;(t) is 0. (Because the summation extends only 
over indices where p,;(t) #0, the probability that 4,;=0 tends to 0 as N in- 
creases indefinitely.) 

Let Q(Q) be the minimum of Q obtained by holding # fixed and permitting p 

to vary subject to the restrictions $j;(Q) and let Q(H) be the corresponding 
minimum of Q when p is restricted by the system Fy(H), and let 


Tu = QCM) — Q(Q). (52) 


Consider now a sequence of samples of sizes N™, w»=1, 2,---+ such that 
NM—o as uo, and let 7y™ be the test statistic 74 corresponding to the 








902 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


sample N™. Further, let m1, 72, °° + 2s@) be any set of positive constants 
with their sum equal to 1 and for any sequence of events A(N™), let 


P,[A(N™)] = P[A(N™) | (0) = 94.N™, i € S(O). (53) 
Under these specifications, we have the following theorem: 
Theorem of Gold 
(a) If H is true, the sequence of test statistics Tf is distributed in the limit, 
as uo, as x? with f degrees of freedom. 


(b) If H is true, 


(us 


(u) 2 ) 2 
P\T x Ps Xa; A < Xa} —>@, 


(um) 


P{Tu <x, A. >xe} 0 
(c) If Al is false, 


P} Tu a Xa} =? l. 
(d) Statements a), b), c) hold true when P is exchanged with P,. 


6. COMPARISON OF THE STRENGTH OF ASSOCIATION IN 7, AND 72 


Returning to the problem of comparing the strength of association in 7 
with that in rw, under Model II sampling, we shall associate to the system of 
observations v, P defined in Section 4 a Markov chain defined as follows: 

To every individual sampled we associate a sample point w=(wo, wi, w2), 
where w is 1 or 0 according to whether the individual comes from 7, or m2, «1 
is 1 or 0 according to whether his y value is 1 or 0, and w: is 1 or 0 according to 
whether his x value is 1 or 0. 

Select arbitrarily a positive number 7’, 0<7’<1, and to the eight sample 
points, assign the following probabilities. 

P(w') = 7’ Pipi, P(w?) = (1 — n’) Pops, 

P(w*) = 9/(1 — P1)po, P(w*) = (1 — 9’)(1 — Ps) pa, 
P(w®) = 9 (Pi(l — pi), P(w*) = (1 — 9’) P3(1 — pa), 
P(w) =(1—P)(1—p.), — P(w’) = (1 — (1 — Ps)(1 — po. 


(54) 


On this‘sample space we define the random variables ¢,, !=0, 1, 2, by assigning 
to them values from a state space of eight points S= {1, 2, -- -, 8} according 
to the following scheme: 

£o(w) = lif Wo z £o(w) = 2, if oo = 0; 
2 if wo = 0, w: = 


4 if wo = 0, w, = 


(1 if wo = 1,0 = 1, 
\3 if woo = 1, w; 0, 


f(s) = 4 


£1(w) = 


(1 if wo = 1, wo, = 1, we (2 if wo = 0, w, = 
3 if w = 1, wo = = i, \4 if wo = 0, 
f2(w) cadai 


5 if wo = 1, w: = 1, we 0 |6 if wo = 0, 


7 if wo = 1, wo) = 0 \8 if we = 0, 





ASSOCIATION BETWEEN BINARY CHARACTERISTICS 
The initial distribution of the chain is given by 
P,(0) = 7’, P,(0) = (1 — 7’), P,(0) = 0 for 2 ¥ 1, 2; 

the transition probabilities are 

pull) = Pi, pis(l) =(1— Pi), pao(l) = Ps, = pos(1) = (1 — Po), 

Di(2) = pi, Dises(2) = (1 —p), *4=1,2,3,4, 

while all transition probabilities not listed are 0, corresponding to the relations 

S,(0) = 1, 3; S2(0) = 2, 4; S(1) = 1, 2, 3, 4; S,(1) =7,7+ 4, 

fort = 1,2,3,4 (8) 

The restrictions imposed by H;, j7=1, 2 on the vector p, and in the case of H,, 

on p, P;,P3 can now be interpreted as restrictions $y4(H;) on the transition 


probabilities of the chain. Accordingly, they can be tested by the test statistics 
T 1 based on the observation vector fy given by 

A Vj r l 

Pull) = N, ’ pis(1) = = 

; V3 : 

p22(1) N ’ pull) = 

Pi(2) = fr, pas(2) = pr, 

po2(2) = ps, pss(2) = pa, ete. 


where the sample sizes N; and N- are such that 
oe M bre 
Ni+N2 

To simplify notation, we may put 
Vy 


Ni = », N2 = v6, P,= Ps; P,= Ps; Ds os 


V5 


With this notation we get 


6 vil Pi — pi)? 
60 
ae ” 


Because Qu(Q)=0 and Hi, Hz do not involve ps nor pes, it follows that for 
j=1, 2, the statistics T(H;) are identical with x2(H;), and under the specified 
limiting conditions (i.e., NY“ /NM+NY =constant while n> ~ ) the asymptotic 
distribution of the random variables x2(H;) obtained under Model II sampling 
is the same as the asymptotic conditional distribution of the variables T%(H;) 
given 


N N2 
m(0) = “NaN N®, (0) = Ni +N, N®. (61) 
é 2 1 








904 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 
It follows by part (d) of the theorem of Gold, that the asymptotic distribution 
of x(H;) is the same under Model II as under Model I sampling. 
To test Hy, Qu has to be minimized under the restriction fy(H,). H, states 
that 
Gi — 2 
-§ 1/2f \ |—1/2 1/2 
(pi — p»)[ps(1 — ps)|'/*[ pips + po(l — Ps) | [1 — Pips — p2(1 — ps) |? 
— (ps — ps)[pe(1 — pe)]'/*[pspe + pa(l — po) }-"2[1 — paps — pa(l — pe)}-*? 
= Q (62) 


and 


6; 


>/( = A wea pi) 


j=l Op; Vj 


Tu - 


as seen before. 
7. AN EXAMPLE 


In the experiment described under Model II the sample sizes N; and N2 were 
selected by the experimenter. Gold’s theorem assures, however, the same 
asymptotic distributions for the statistics Ty even if the division of the total 
sample size N, +N, between the populations 7; and 72 is done by a chance 
mechanism, as long as the probability of an individual being chosen from 7m 
rather than 7-2 is not restricted by the hypothesis tested. One may even go a 
step further: even if the total sample size is the outcome of the play of chance, 
the asymptotic distributions of the statistic Ty remain unaffected, as long as 
the hypotheses tested do not restrict the distribution ef the sample size. (This 
can be seen by enlarging the chain with two new states, 00 and 01, to serve 
as new initial states, where all those who are to be included in the sample 
occupy state (01) at t= —1.) This type of reasoning may be called for when 
attempting to analyze the data collected in the study mentioned in the intro- 
duction, reference [9]. 

In this study the over-all sample size was the result of several selections, 
some of them possibly random: From the geographical area of interest six 
census tracts, previously selected for purposes unrelated to the present query, 
were canvassed for people over 65 years of age; among these, individuals with 
specified types of socio-economic ratiigs were singled out and 89 per cent of 
these, picked out by an unrevealed method, were actually interviewed. As 
already mentioned, the characteristics studied were physical disability and 
mental disability; the association of these was to be compared among males and 
females in the given age group and area. The total number of individuals in 
the six census tracts could be regarded as the fixed over-all sample size N and 
a Markov chain could be defined over a sample space consisting of nine points, 


w®, w', + - +, w§, where 


w° stands for an individual not interviewed 
w' stands for an individual who is male, healthy, sane, 





ASSOCIATION BETWEEN BINARY CHARACTERISTICS 


stands for an individual who is female, healthy, sane, 
stands for an individual who is male, sick, sane, 

stands for an individual who is female sick, sane, 
stands for an individual who is male, sick, insane, 
stands for an individual who is female, sick, insane, 
stands for an individual who is male, healthy, insane, 

8 stands for an individual who is female, healthy, insane, 


a 
wo 
w! 
w 
w® 
w? 
W 
The state space consists of 10 points, S=(00, 01, 1, 2,---, 8), the random 
variables of the chain are: 


€.(w°) =00 for <= —1, 0, 1, 2, 
§1(w') =01 for i=1, 2,---,8 
&,(w') for ¢=0, 1, 2 and i=1, 2, - - - , 8 as defined by (55). 


The chain can be pictured as in Graph 1. 


| 
| 
| 
| 
| 
| 
| 


| 

| 

| 

i ‘ 
te-1 tO tl 


Grapa8 1. States and transitions of Markov chain representing a three-way 
classification arising in a mental health survey. (See Section 7.) 


If one attempted to analyze the data statistically, one could take the position 
that the hypothesis claiming equality of a selected measure of the strength of 





906 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


association for the two sexes was in no way restricting the various mechanisms 
that entered into the selection of the interviewed individuals and that therefore 
the methods presented above would permit valid conclusions for whatever 
parent population the interviewed group represents. On the other hand, if it is 
suspected that the prevailing ratio of the sexes in this age group is itself at 
least in part a reflection of a differential association between mental illness 
and physical disability with its consequence, mortality, then methods that can 
utilize the additional information in the sex ratio would be preferable to the 
one presented. 

In the survey quoted above no significance tests are reported concerning the 
coefficients of association, but the observed value of ¢ for men was so much 
larger than for women in the age group 65-75 that it was taken to indicate that 
in this age group severe mental disorders of men are more highly associated 
with severe physical illnesses than those of women. This in turn suggests that 
in this age group the sexes are perhaps afflicted with different illnesses. 


8. CONCLUDING REMARK 


It should be noted that in this discussion the Markov chain is used as a 
technical device to establish the asymptotic behavior of a statistic. The values 
t=0, 1, 2 are not interpreted as time, nor are the transition probabilities inter- 
preted as the (conditional) probabilities of acquiring the characteristics studied, 
and therefore the outcome of the tests described does not give any information 
about the latter. 

In problems where interest centers more appropriately in these latter prob- 
abilities than in the measures of association, the need for models involving 
time explicitly and for observations taken at different times has already been 
pointed out by several authors. (See, for example, Dorn [3], Sartwell and 
Merrell [15], Fix and Neyman [5], Neyman [11, 12], and Wijsman [18].) The 
Markov chain discussed here, with ¢ representing time, could serve as such a 
model in cases where the assumptions on the transition probabilities correspond 
to the realities of the problem. 


APPENDIX 


Proof of Corollary. We can write p;=f;(0), i=1, 2, - - - , v and because of the 
assumption 


| #0 


la 


at all points of the range of 6, the equations have unique solutions 0;=F,(p), 
i=1,2,-+-+,v at every point of the range {f,(), i=1, 2,---, v}, so that in 
this range the representation p;=f;(@) does not imply any restrictions on the 
value of p. Let po be the true value of p and let 5>0 be chosen so small that 
{p:|p—po| <6} is within the range of variation of {f,(@)}. For any e>0, there 
exists an n(e, 5) so large that for n’>n/(e, 5) 


P{|p—po| <8} >1-« 


(here n>n7’ means n,>”,7,---, n,>nj.) Thus for y sufficiently large the 





ASSOCIATION BETWEEN BINARY CHARACTERISTICS 907 


value p= falls within the range of variation of { {(O),t=1,°--, v} with prob- 


ability larger than 1—«. Since x2=0 for p=q and x2>0, 


P{ xar(Q) =0}>1-.e for y large enough 


lim P{x,7(2) = 0} =1 


77 @ 
and, therefore, 


lim P{ Tar > xa} = lim P{ xnr(H) > Xa} 
yo2 


yo2 


ACKNOWLEDGMENT 


It is a pleasure to use this opportunity to express my appreciation to Dr. 
Rosedith Sitgreaves and to Dr. John W. Fertig for the many helpful sugges- 
tions I received from them while preparing this paper. 

My thanks are due also to a referee for additional bibliography. 


REFERENCES 


[1] Anderson, T. W. and Goodman, L., “Statistical Inference about Markov Chains,” 
Annals of Mathematical Statistics, 28 (1957), 89-110. 

[2] Bartlett, M. 8., “Contingency Table Interactions,” Journal of the Royal Statistical 
Society, Supplement, 2 (1935), 248-52. 

[3] Dorn, H. F., “Methods of Measuring Incidence and Prevalence of Disease,” American 
Journal of Public Health, 41 (1951), 271-8. 

[4] Fisher, R. A., “On the Interpretation of Chi-Square from Contingency Tables, and 
the Calculation of P,” Journal of the Royal Statistical Society, 85, Part I (1922), 
87-94. 

[5] Fix, E. and Neyman, J., “Simple Stochastic Models of Recovery, Relapse, Death, 
and Loss of Patients,” Human Biology, 23 (1951), 205-41. 

[6] Gold, R. %., “Inference about Markov Chains with Non-Stationary Transition 
Probabilities,” Ph.D. Dissertation (1960), Columbia University Library. 

[7] Goodman, L. A. and Kruskal, W. H., “Measures of Association for Cross Classifica- 
tions,” Journal of the American Statistical Association, 49 (1954), 732-64. 

[8] Lancaster, H. O., “Complex Contingency Tables Treated by the Partition of x?,” 
Journal of the Royal Statistical Society, Series B, 13 (1951), 242-9. 

[9] New York State Department of Mental Hygiene, Mental Health Reserve Unit, “A 
Mental Health Survey of Older People,” Psychiatric Quarterly Supplement, Parts 
I and II (1959), Part I (1960). 

[10] Neyman, J. “Contribution to the Theory of the x?-Test,” Proceedings of the Berkeley 
Symposium or Mathematical Statistics and Probability, University of California Press, 
Berkeley (1949), 39-274. 

{11] , First Course in Probability and Statistics, New York: Holt, Rinehart, and 
Winston, Inc., 1950. 

[12] , “Statistics—Servant of All Sciences,” Science, 122 (1955), 401-6. 

[13] Norton, H. W., “Calculation of Chi-Square for Complex Contingency Tables,” 
Journal of the American Statistical Association, 40 (1945), 251-8. 

[14] Reiersgi, O., “Tests of Linear Hypotheses Concerning Binomial Experiments,” 
Skandinavsk Aktuarietidskrift, 37 (1954), 38-59. 

{15] Sartwell, P. E. and Merrell, M., “Influence of Dynamic Character of Chronic Disease 
on the Interpretation of Morbidity Rates,” American Journal of Public Health, 42 
(1952), 579-84. 





AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


Simpson, E. H., “The Interpretation of Interaction in Contingency Tables,” Journal 
of the Royal Statistical Society, Series B, 13 (1951), 238-41. 

Wald, A., “Tests of Statistical Hypotheses Concerning Several Parameters When 
the Number of Observations Is Large,” Transactions of the American Mathematical 
Society, 54 (1943), 426-82. 

Wijsman, R. A., “Contribution to the Study of the Question of Association between 
Two Diseases,” Human Biology, 30 (1958), 219-36. 

Yule, G. Udny, “On the Association of Attributes in Statistics: with Illustrations 
from the Material from the Childhood Society, etc.,” Philosophical Transactions A, 
194 (1899), 257-319. 


Supplement to References 
(compiled by referee) 


Bhapkar, V. P., “Contributions to the statistical analysis of experiments with one or 
more responses (not necessarily normal),” North Carolina Institute of Statistics, 
Mimeo series No. 229 (1959). 

Bhapkar, V. P., “Some tests for categorical data,” Annals of Mathematical Statistics, 
32 (1961), 72-83. 

|] Cramér, H., Mathematical Methods of Statistics, Princeton University Press, 1946, 
Chapter 30. 

Diamond, E. L., “Asymptotic power and independence of certain classes of tests on 
categorical data,” North Carolina Institute of Statistics Mimeo Series No. 196 (1958). 
Diamond, E. L., Mitra, 8. K., and Roy, 8. N., “Asymptotic power and asymptotic 
independence in the statistical analysis of categorical data,” Proceedings of the 31st 
Session of International Statistical Institute (1958). 

Mitra, 8. K., “Contributions to the statistical analysis of categorical data,” North 
Carolina Institute of Statistics Mimeo Series No. 142 (1955). 

Mitra, 8. K., “On the limiting power function of the asumptotic x?-test,” Annals of 
Mathematical Statistics, 29 (1958), 1221-33. 

Ogawa, J., “On the mathematical principles underlying the theory of the  x*-test,” 
North Carolina Institute of Statistics Mimeo Series No. 162 (1957). 

Roy, 8. N. and Mitra, 8. K., “An introduction to some non-parametric generalizations 
of analysis of variance and multivariate analysis,” Biometrika, 44 (1956), 361-76. 
Roy, 8. N., Some Aspects of Multivariate Analysis, New York: John Wiley and Sons, 
1957, Chapter 15. 

Roy, 8. N. and Bhapkar, V. P., “Some non-parametric analogs of ‘normal’ ANOVA, 
MANOVA and of studies in ‘normal’ association,” Contributions to Probability and 
Statistics (Hotelling dedicatory volume), Stanford: Stanford University Press, 1960, 
371-87. 

Silvey, 8S. D., “Lagrangian multiplier test,” Annals of Mathematical Statistics, 30 
(1959), 389-407. 





FAILURE OF ENUMERATORS TO MAKE ENTRIES OF ZERO: 
ERRORS IN RECORDING CHILDLESS CASES 
IN POPULATION CENSUSES 


M. A. Ex-Bapry 
Demographic Training and Research Centre, Bombay' 


It has been observed that enumerators sometimes fail, for one reason 
or another, to record a zero answer properly on the schedule, thus 
leading to the eventual tabulation of the answer as “not given” and 
introducing error in the data. This article indicates the existence of 
this error in the data on children ever born which have been collected 
in many censuses. A study of the covariation of the proportions tabu- 
lated as childless and as “not given” in the various age groups has 
shown the existence of a linear or nearly-linear relationship between 
the two proportions in many instances. This linearity can be utilized, 
as explained here, to detect the existence of the error and to adjust 
the data, under certain conditions, 


1. INTRODUCTION: THE ZERO ERROR 


HIs article studies the extent to which childless women have no report on 
"Tchitdren ever born to them in population censuses. The material relates 
to census data on children ever born only, but failure to make a zero entry 
is obviously a cause of error in other subjects and in other types of survey 


operations also. 

The census question on number of children ever born has provided dem- 
ographers with a very useful body of data for studying various aspects of 
fertility. However, answers to this question have shown several deficiencies 
which need more attention, particularly in countries where, because vital 
registration is non-existent or highly deficient, the census question on parity 
is the main source of information on this important aspect of human growth. 
Attention is needed in wording the question, in recording the answers and 
in evaluating and adjusting the data before they are used. 

The discussion here concerns mainly the existence of an error in the children- 
ever-born data which arises in recording the entry belonging to women who 
have not had any children. In such cases the enumerators (or the individuals in 
self-enumerated censuses) are normally instructed to insert in the relevant 
space a zero or some specified mark denoting zero. Sometimes the instructions 
are not followed properly and the space is left blank or filled with a mark dif- 
ferent from the one given in the instructions (like a “—” for example). The 
same situation also arises sometimes when the person who fills in the schedule 
makes the erroneous assumption that the question is irrelevant, such as in the 
case of never married women, and the zero or the specified mark is not inserted. 
When the schedules are returned for processing, such cases are ambiguous 
and the coders have no way of knowing whether “number of children not 
stated” or “respondent has borne no children” is intended. A frequent procedure 
is to tabulate ail of these ambiguous cases as “number of children not given.” 





1 On leave from Cairo University, U.A.R. 


909 





910 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


The possible seriousness of this “Zero Error” arises because the number of 
childless women as given by the tables will be underestimated and wil! there- 
fore influence the results of any analysis based on the number of childless 
women—the study of sterility for example. It will also lead to an increase in 
the average number of children ever born per woman whenever this average is 
calculated for the women whose number of children was given. This is ob- 
viously true because, while the total number of children is not affected by this 
error, the denominator, namely the total number of women, is reduced by the 
number of those childless women who have been erroneously tabulated in the 
“number of children not given” cell. 

Another reason why this error can be serious is that it does not influence 
evenly all age or marriage duration groups of the woman; it is apt to happen 
more frequently in the young ages or short marriage durations, where the 
proportion of childless women is high, and then decrease with the increase in 
age or duration as the proportion of childless women gets smaller and smaller. 
Age or duration specific rates will thus be influenced, and the extent to which 
they are affected will generally decrease with the increase in age or duration. 

Moreover, the degree of incidence of this error is quite likely to vary from 
one area to another, depending upon the care with which the data collection 
and the field supervision are carried out in the field. The study of differential 
fertility between regions or between educational, occupational or other groups 
is thus bound to be influenced by this error whenever it is large. 


2. ERRORS IN CENSUS PARITY DATA 


Even though most of the reports on children ever born may well be correct, 
the data are exposed to several sources of error. For a clear understanding of 
the discussion in this article, it is important to point out first the main types 
of these errors: 

Misunderstanding the question itself can give rise to a number of errors: 
(a) the mortality error which arises when the individuals give the number of 
children who are still alive rather than those ever born; (b) the non-resident 
error caused by overlooking the live children who are not residing with the 
mother, a married daughter or a child staying with his grandmother, etc.; 
and (c) the marriage error which occurs when the question is pertinent to all 
marriages and the woman who has been married more than once does not in- 
clude her children to previous marriages in her report. This same error also 
happens in the case of illegitimate children. The three types of error belonging 
to this group can also arise from unawareness when someone is reporting for 
the woman. Children to a previous marriage and illegitimate children are also 
sometimes deliberately omitted.? 

Next comes a group of errors arising from the respondents’ lapse of memory 
or neglect, even though the question is properly understood: (d) the memory 
error which is caused by tendency on the part of some mothers, particularly 
the elderly, to forget to include in the report some of their children who had 
died, especially those who died in infancy and early childhood; and (e) the 





4 In this group we also encounter such errors as the inclusion of an adopted or a step child, or the inclusion of 
still-births as live births or vice versa, etc. 





ERRORS IN REPORTING CHILDLESS CASES 911 


baby error caused by the common tendency to overlook reporting the live 
young children. 

The third group of errors is due to enumerators’ failure to reach the in- 
dividuals: (f) the not-at-home error which occurs when, after failing to find a 
knowledgeable adult, the enumerator finally asks a neighbor or anybody in the 
housebold who does not know the woman’s parity and consequently she is 
eventually tabulated as “not given.” Bias arises in this case because the not-at- 
home usually have a smaller number of children; and (g) the coverage error 
which arises when the enumerator does not ask the question on parity in a 
whole area or forgets, occasionally, to record the answer which he had received. 

Finally we have (h) the zero error which is a recording error. 

Very little can be done to adjust the first group, namely the mortality, the 
non-resident and the marriage errors on the basis of the tabulated data alone, 
but they can well be reduced by a careful wording of the question.* When a 
non-increasing fertility pattern prevails, the memory error is revealed by the 
observation that after a certain age around 45, the average number of children 
ever born, as calculated from the tables, decreases rather than increases; it is 
also shown in some censuses by an increase in the percentage “not given” 
among successively older cohorts starting at age 40 or 45. Under these cir- 
cumstances it may be plausible, in partial adjustment of the error, to take as 
an estimate of completed fertility the average parity in the last age group 
after which the decrease starts and discard the reported parities in subsequent 
age groups. 

Possible measures for restraining the zero error during the data collection 
will be mentioned in section 7. When we deal with data which have already 
been tabulated it is still possible in many instances, however, to detect and 
adjust the zero error. This latter study, which is the main purpose of this ar- 
ticle, is dealt with in sections 3-5. In section 6 illustrations are made by using 
the data as well as the experience gained in a number of censuses where the 
question on parity was asked, particularly those of Asian countries. 


3. VARIATION WITH AGE IN THE PROPORTION OF WOMEN WITH PARITY NOT GIVEN: 
THE CHILDLESSNESS ERROR 


Let us consider a census where the errors resulting from misunderstanding 
the parity question—namely the mortality, non-resident, and marriage errors— 
and the coverage error do not give rise to considerable bias over age in the 
proportion of non-reports to the parity question among women for whom at 
least the age was reported, except in so far as causing a zero reply which will in 
its turn give rise to a zero error. 

If we denote by “C” the proportion tabulated in the “not given” category in 
the table on parity by age of the woman then, in the absence of the not-at-home 
and the zero errors, the proportions C in all age groups would not show a signif- 
icant trend as age increases up to 40 or 45 years, after which there would be an 





3 The mortality error can be reduced considerably by asking two questions instead of one: the first about the 
number of children who are still alive and the second regarding the number of those who had died. The non-resident 
error necessitates, when asking about the number still alive, the addition of the phrase “including those who are 
not living with you.” When the parity question is pertinent to all marriages, the number of children to previous 
marriages should also be obtained. 





912 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


upward trend in C resulting from a more pronounced memory error. But when 
either or both the not-at-home and the zero errors have occurred, their in- 
cidence will be higher in the younger ages and will decrease with the increase in 
age, because the not-at-home error is in general negatively correlated with the 
number of children and the zero error arises only in the childless cases. In this 
situation the values of C will show a gradual decrease as age increases until 
about age 40 or 45, followed by the increase due to lapse of memory. 

Calculations of the proportions C show that this latter pattern existed in 
many countries and communities where the question on number of children 
ever born was asked. It existed for instance in the censuses of Japan (1950), 
Philippines (1948), Malaya (1947 and 1957), Singapore (1947), India (1951), 
Egypt (1947), France (1946) and the United States (1940). In all these censuses 
the pattern of variation in C with age was the same though, naturally, the 
proportions “not given” differed in their magnitude from one census to another. 
This pattern is shown in figure 1 for some of the censuses. 

The occurrence of this downward sloping trend in the values of C in ages 
below 40 indicates, under the conditions assumed in the leading paragraph 
of this section, the existence of one or both of the not-at-home and the zero 
errors. We must note, however, that the contribution of the not-at-home error 
to C does not involve all the not-at-home cases; it is limited to those where the 
neighbor or someone in the household gave at least the age (his own estimate 
in many cases) but failed to answer the question on number of children ever 
born. This contribution can be important in some countries, but there are 
many communities where it cannot be significant; for example, areas where 
neighbors know each other or where women do not go out frequently. This 
would be the case in rural and not-so-urbanized areas, as in most of the coun- 
tries of Asia with which this report is mainly concerned. Even in Japan, the 
1950 census data supply evidence, which will be presented in section 6.1, show- 
ing that the not-at-home error must have been quite limited. Consequently, 
the decreasing trend in the values of the proportion C in such areas would be 
due mainly to the zero error. 

In fact, tabulations of women by age and parity in these areas show clearly 
that the variation in C within the ages below 40 or 45 can be accounted for 
mainly in terms of childlessness. This conclusion is arrived at by studying the 
covariation of C with the proportion of women tabuluted as childless, from one 
age group to another. The very high degree of association, presented in figure 2, 
shows that the decrease in C can be accounted for mainly in terms of one or 
both of the following two factors: 


a) the zero error which happens only in the childless cases, 

b) non-reports for childless women of whom at least the age has been re- 
ported: particularly the not-at-home error for childless women which 
occurs when the neighbor says: “I do not know, I have not seen any 
children with her” and the case is then recorded as “not given.” 


In the above mentioned areas where the non-reports on parity among 
actually childless women for whom at least age was reported are bound to be 
limited, the latter source would probably have little effect, compared to the 





ERRORS IN REPORTING CHILDLESS CASES 


35 3 


—_——o 


yu"d 
—_ at 
— set 


same’ 
io 
ive >» 


_ 
_—e 


avs! 
. ——* 


2 
had 
4 
i) 
2 
=z 
ios) 
i-4 
fon] 
= 
x 
oO 
aw 
i) 
oa 
ve) 
fs] 
x= 
> 
=z 
= 
= 
> 
z 
had 
x 
> 
~~ 
o 
as 
oO 
xc 
pom 
=z 
la 
ao 
oc 
lu 
a 








0+— ; - ; - + : - : 
5.0 175 225 275 325 375 425 475 525 575 
AGE 


Fig. 1. Variation with age in the percentage “c” of 
women with number of children not given. 


zero error, on the variation in C. Conclusive evidence can be supplied, how- 
ever, only by post-enumeration checks (one form of which is the 1957 Malayan 
experience referred to in section 6.4 where nearly all the non-reports were found 
to be for never married childless women, i.e., cases where a zero entry should 
have been made) or by a careful study of the information available on the 
census schedule (such as the test referred to in section 6.4 also which was 
carried out in Malaya and Singapore in 1947 and where it was found that in 
ail but a small fraction the non-reports were actually young, never-married 








AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


(a) 20)¢ (B) 20)¢ cc) 
JAPAN, 1950 JAPAN, 1850 JAPAN, |950 
Total Population Urban Population Rural Population 


5¢ L 
0 20 40 60 0 20 40 60 80 








a (0) ¢c cr) 
© PHILLIPINES O48 FEDERATION OF MALAYA, 
1947 





, L L 
0 20 40 60 80 20 40 60 80 














¢ PMH) oo) 
FRANCE, 1946 / eereywar . UNITED STATES, 1940 


/ 


/ / 
| , 
/ 











i L 0 L 


~ 20 40 60 0 20 40 60 80 0 20 40 60 g0 
C+ Percentage Tabulated Not Given, Le» Percentage Tabulated Chiidiess. 








Fic. 2. Relation between tabulated percentage “childless” and 
tabulated percentage “not given.” 


and childless). In the absence of such evidence, however, the census tables 
themselves will furnish no means of separating these two factors. This is why 
we are going to study here their combined effect, namely, the “childlessness 
error,” which will probably be, in many of the areas considered in this article, 
substantially zero error. 

As far as the adjustment of the data is concerned, there is no need for isolat- 





ERRORS IN REPORTING CRILDLESS CASES 915 


ing the zero error cases alone. What is actually needed is an estimate of the 
total number of the childlessness error cases in each age group so that they 
can be shifted from the “not given” to the no-children category. 

We proceed now to show how the table on parity by age can be utilized to 
find whether the childlessness error was the main cause of decrease up to age 
40 or 45 in the proportion C. We will then show how to estimate the degree of 
its incidence, and consequently how to adjust the parity data, under certain 
conditions. 


4. RELATIONSHIP BETWEEN THE PROPORTIONS TABULATED AS CHILDLESS AND 
AS “PARITY NOT GIVEN” 


Let us consider now the covariation of C, the proportion tabulated in the 
“not given” parity category, and L, the corresponding proportion tabulated as 
childless, in ages below 40. In a study of this covariation in a large number of 
countries and communities, particularly those in Asia, it was found that in 
many of these areas there existed a linear or nearly-linear relationship between 
C and L. This linearity has been observed, for instance, in both the rural and 
the urban areas of Japan (1950), in the Philippines (1948), the various races in 
Malaya (1947 and 1957), and in Singapore (1947). It has also been observed in 
the various provinces of Egypt (1947) and in France (1931 and 1946). The 
relationships between C and L in some of these areas are shown in figure 2. It 
must be emphasized, however, that the linearity of the relationship is not 
universal: the 1940 census results of the United States for instance show a 
strong but non-linear relationship between the two proportions (see figure 
2(I)). 


The linear relationship 
c. = aL. ot b (1) 


between the pairs of proportions (C,, L,) at ages below 40 means that the 
proportion C, of the “not givens” of age z is composed of two parts: a constant 
and a constant multiple of the proportion of women tabulated as childless at 
that age. In other words, the proportions C are accounted for completely, in 
this case, by a constant amount which does not vary with age, plus childless- 
ness, and nothing more. 

The linear relationship (1) can well be explained as follows in areas where, 
for women below 40, the childlessness error is the main cause of variation in 
C and other sources of error affecting C are nearly unbiased: If we assume that 
“y,” the degree of incidence of the childlessness error among childless women, 
is constant over age and if we denote by “R,” the proportion of actually childless 
women of age x then 


C. = yR: + (2) 


where b is the contribution of factors arising from the enumerator’s failure to 
ask the question, to record the answer or to get an answer when he asks this 
particular question, even though he managed to obtain an answer to at least 
the question on age. Now since R, is larger than L, by an amount equal to the 
childlessness error, i.e. 





AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 
R, on Lz = yRz, 


then (2) can also be expressed in terms of L, in the form: 


Lacroreinte (3) 
‘=% 

Comparison between (1) and (3) enables us to interpret the linearity ob- 
served in many communities between the tabulated proportions “not given” 
and childless. This linearity is in fact the relationship that would arise when 
the data collection permits a certain degree of chil“lessness error in addition 
to another degree of actual “not givens” which is caused, among women below 
40 for whom at least the age was reported, by nearly unbiased factors. 


5. USE OF THE LINEAR RELATIONSHIP TO ESTIMATE THE CHILDLESSNESS ERROR 
AND TO ADJUST THE NUMBERS CHILDLESS 


It has been shown in the preceding section that when the points (C,, Lz) 
corresponding to ages below 40 fall nearly on a straight line 


C,; = aL,+ 6b 


then, assuming that other errors which cause the woman to be tabulated as 
“parity not given” are nearly unbiased for ages below 40, the downward slope 
of the straight line is explicable by the childlessness error and the constant b 
will then be an estimate of the actual proportion of non-reports to the parity 
question for whom at least age was reported. 

When the straight line is fitted, it will provide not only an estimate of this 
latter proportion of non-reports but also an estimate of the extent of the child- 
lessness error. The estimate of the actual percentage “not given” is obviously 
equal to the intercept on the C axis by the line. Comparison between equations 
(1) and (3) in the preceding section also shows that the slope “a” of the fitted 
line is equal to y/(1—y) or, in other words, the degree of incidence of the zero 
error is estimated as 


a 
, l+a 

Adjustment of the childlessness error can readily be carried out after know- 
ing the value of “b.” In fact the difference C —b is the result of the childlessness 
error which should be shifted back to the zero-children group. Thus the ad- 
justment in the age group z is undertaken first by subtracting “b” from the 
value of C, and second by multiplying the difference C,—b by the total number 
of women in that age group. This latter product is the adjustment that should 
be deducted from the tabulated “not givens” and added to va tabulated num- 
ber of childless women in that age group. 

Once we find evidence strongly suggesting the existence of the linear rela- 
tionship between C and L in a census, it may not be necessary to fit straight 
lines in order to estimate the childlessness error and the actual “not givens” 
in smaller groups of the population. Sufficiently accurate estimates of “y” and 
“b” may be obtained from the values of the proportions C and L in two ade- 





ERRORS IN REPORTING CHILDLESS CASES 917 


quately large age groups, 20-24 and 25-29 say. When these proportions, 
(Ci, Zn) and (C2, Lz), have been calculated then obviously: 


Ci-C; 
L; — Lz 
(Ci — C2) 
(Ci — C2) + (Li — Le) 
and b is subsequently estimated by substituting for a in the linear equation 
C,=al,+b 


a= 





le. y= 


6. THE CHILDLESSNESS ERROR IN SOME COUNTRIES 


In this section illustrations and comments on the incidence of the childless- 
ness and the zero errors in some countries, particularly those of South-East 
Asia, are given.‘ The illustrations aim at presenting evidence available in census 
material as well as experience of census authorities and analysts regarding the 
occurrence of the zero and childlessness errors. Another aim is to show, in addi- 
tion to the applicability and limitation of the technique described in this 
article, the very high incidence of the childlessness error in some areas and how 
serious it would be to use their data without adjustment. 


6.1. Japan (1950): 


Only 1.2% of the ever married women in the 10% sample tabulations of the 
1950 census of Japan were classified in the “number of children not given” 
category, an observation which shows clearly that the childlessness error can 
only be of a very limited extent. However, this error exists and is revealed by 
the data presented in figure 2(A). The figure shows that the proportions (C, 
L) for ever married women in all Japan in the age groups between 15 and 40 lie 
very close to a straight line. The equation of this line gives us the following 
information: 


1. the quantity b is nil; ie., the childlessness error accounts for practically 
all the “not givens,” 
2. the slope of the line is equal to 0.07, or the childlessness error is 


This latter percentage means that out of every hundred actually childless cases, 
seven were erroneously tabulated in the “number of children not given” group 
because of the childlessness error. 

The almost exact linearity is also shown by the data for the urban (Shi) 
and rural (Gun) areas in Japan, as one can well see from figures 2(B) and 2(C). 
The two straight lines show that the childlesspess error was 7% and 6% in 
the urban and rural areas respectively. They also indicate that the true “not 
givens” are negligible in each of the two sections of the population. The three 
diagrams for Japan, given in figure 2, thus indicate that practically all the 
women tabulated as “number of children not given” were actually childless 





4 It was not possible to incorporate in this study Ceylon (1946) or Burma (1954) because their parity tables 
suppiied no information on women with “parity not given.” 





918 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


and should therefore be shifted back to the zero children cells in the correspond- 
ing age groups. 

It is opportune at this stage to take advantage of the fact that the Japanese 
data were published in greater detail than in many other countries to give more 
illustrations of the existence of the childlessness error and to show that this 
latter error must have been substantially zero error: 

Further evidence regarding the existence of the childlessness error in 1950 
is available in the 3% sample sub-tabulations by age and marriage duration 
for currently married women who have been married once only. Let us con- 
sider in these tabulations the proportion of the “not givens” in each age group, 
among women whose marriage duration was less than 5 years. The reader will 
notice in Table 1 that the proportion decreases up to age 35 in accordance with 
the pattern shown by women of all marriage durations (see figure 1), but after 
that age the proportion increases again, quickly and to a much higher level 
compared to that in young ages. This latter increase cannot be attributed 
substantially to memory because the women have been married once only and 
for less than 5 years. It has taken place, in all probability, because childlessness 
among wives with less than 5 years of married life increases with age after 35, 
thus giving rise to increasing incidence of the childlessness error whenever the 
data collection permits the occurrence of that error. 


TABLE 1. PROPORTION (PER 10,000) OF WOMEN WITH NUMBER OF 
CHILDREN NOT GIVEN IN EACH AGE GROUP AMONG THE CUR- 
RENTLY MARRIED WOMEN, MARRIED ONCE, WITH MARRIAGE 

DURATION LESS THAN FIVE YEARS, JAPAN, 1950 








Age group Proportion of the “not givens” 





15-19 241 
20-24 167 
25-29 115 
30-34 94 
35-39 173 
40-44 400 
45-49 417 








Source: Population Census of 1950, Special Report: Fertility of Japanese women; Table 1(b). 


Again if we take the currently married women aged 35 to 54 who have been 
married only once and calculate for each age-marriage duration group the 
proportion tabulated as “number of children not given” we obtain the figures 
presented in Table 2. For every age group among these women we notice that 
the proportion of the “not givens” decreases with the increase in marriage 
duration. Such a trend must obviously be due to a factor different from memory 
snce the latter would cause the proportion to increase—rather than decrease— 
as marriage duration increases. The trend is well accounted for by the childless- 
ness error which, in unison with childlessness itself, decreases as duration in- 
creases for women aged 35-54. 
The cross-tabulation of women by age and marriage duration furnishes 
evidence that the not-at-home error can only be minute compared to the zero 
error on several grounds: 





ERRORS IN REPORTING CHILDLESS CASES 919 


1. It is improbable that the respondent will know both age and marriage 
duration but not the number of children ever born. 

2. The trend of the proportion C in each age group in Table 2 cannot be ex- 
plained in terms of a respondent who, after giving the woman’s age and 
marriage duration, fails to give the number of children ever born to her at 
a decreasing rate as marriage duration increases. 

. Again this cause will fail to account for the fact that among the women 
who have been married once only and whose marriage duration was less 
than one year, the number of children ever born was not reported for 4% 
of the cases, which is the largest percentage in any duration interval. It is 
perhaps not possible for the respondent to know that the woman was 
married only once, that the marriage duration was less than one year and 
at the same time fail to know the number of children born during that 
period. Needless to say, the great majority of the women involved are 
childless, or, in other words, cases liable to give rise to the zero error. 


TABLE 2. PROPORTION (PER 10,000) OF WOMEN WITH NUMBER OF 
CHILDREN NOT GIVEN IN EACH AGE-DURATION GROUP AMONG 
THE CURRENTLY MARRIED WOMEN, MARRIED ONCE ONLY, 
JAPAN, 1950 











; Age 
Marriage dura- 
tion in years 








Under 5 
5-9 
10-14 
15-19 
20-24 
25-29 
30-34 
35+ 

















Source: Japan, Population Census of 1950, Special Report: Fertility of Japanese Women, Table 1(b). 


6.2. Philippines (1948): 


The 1948 census data of the Philippines giving the distribution of ever married 
women by age and number of children ever born also show the close linearity 
between C, the tabulated proportion “not given,” and L, the tabulated pro- 
portion childless (figure 2(D)). The straight line shows that the incidence of 
the childlessness error had the extremely high degree of 62%; which is perhaps 
the highest in the Asian countries. This enormous error would certainly impair 
any analysis where the tabulated data are utilized without adjustment. 

Parity data in that census have been published for 16 out of the 51 provinces. 
Among the divisions for which the data are published, the highest incidence of 
the childlessness error was in the province of Agusan where not one ever married 
woman was tabulated as childless in all ages below 25. In 10 out of the 16 
provinces the value of y was between 77% and 67% and it was only in one 
province that y was as low as 40%. 





920 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 
6.3. Singapore (1947): 

The 1947 census of Singapore was taken along with the census of the Federa- 
tion of Malaya, in which the question on number of children ever born to all 
women aged 15 and above was asked. Like many other areas, the childlessness 
error took place in that census, as one can see clearly from figure 2(E), in 
which the points (C,, Lz) show very close linearity. However, the 3% extent of 
the error, as given by the fitted line, is quite small compared to neighboring 
countries. This may well be due to the punching instructions to the effect 
that a dash (which was entered in the census schedule in a large number of 
childless cases) should be punched as 00 and that YY (i.e. not stated) should 
be punched only when there was no entry at all [1, article 267, p. 66]. 


6.4. Federation of Malaya (1947) and (19657): 


The childlessness error again occurred in the two censuses of Malaya in 
which the parity question was asked, namely the 1947 and 1957 censuses 
(see figure 2(F) pertinent to 1947). The degree of its incidence, however, varied 
to a large extent between the States and also between the two censuses, as may 
be seen clearly from Table 3.5 

In 1947 the error was as high as 30% in the two States of Penang and Pahang, 
thus causing 30% of the childless women to be erroneously tabulated in the 
“not given” category. The error was also particularly high in Malacca and 
Kelantan; but it was considerably lower in four other states and untraceable 
in the remaining three. 

It is interesting to note the remarks of the 1947 census authorities in this 
connexion. The 1947 census report for Malaya and Singapore points out [1, 
article 266, p. 67] that in an examination of a large number of cases taken from 
books coming from all parts of the country, it was discovered that in all but a 
very small fraction of the cases against whom a dash has been entered or no 
entry had been made, the woman was young and never married. Furthermore, 
it was also found that in many cases there was evidence on the schedule (such 
as the relationship to the head of the household, the age, or the absence of 
enumerated children) which supported the conclusion that the woman was in 
fact, childless. The punchers were then instructed to consider the dashes as 
denoting no children; this accounts for the low incidence of the zero error in 
most of the states. Failure to follow these coding instructions, it was stated, 
has given rise to high incidence of the error in the four states of Penang, 
Pahang, Malacca and Kelantan [1, article 263, p. 66]. It will be noticed in 
Table 3, however, that considerably smaller degrees of error still existed in some 
of the other states. 

Substantial improvement in the incidence of the error took place in 1957. 
Table 3 shows that not only the overall degree dropped from 7% to 3% but 
also the maximum degree in the 11 states decreased from 30% to 6% only. 

When seeing the 1957 values given in Table 3, Mr. H. Fell, the 1957 Census 
Superintendent, told the present author that the States with a comparatively 
higher incidence were the ones that were tabulated first, before attention was 
given to the cases which had no entry. Some batches of the census schedules 


' In order to avoid fluctuations of small numbers, the values of y in Table 3 were derived from the values of 
C and L in the two large age groups 15-24 and 25-39. 











ERRORS IN REPORTING CHILDLESS CASES 921 


TABLE 3. THE CHILDLESSNESS ERROR (PER CENT) IN EACH STATE, 
FEDERATION OF MALAYA, 1947 AND 1957 








State 1947 | 1957 





Penang 29 
Malacca 20 
Perak — 
Selangor 6 
Negri-Sembilan — 
Pahang 
Johore 
Kedah 
Kelantan 
Trenggano 
Perlis 


| lew mann eea 














Federation of Malaya 





were then sent back for checking in both Selangor and Penang and in nearly 
every blank case the woman was found to be childless and never married. 
The superintendent then gave instructions that never married women for 
whom the relevant space was left blank should be tabulated as childless. 


6.5. India: 


The question on parity was asked in the 1951 Indian census in the three 
states of Travancore-Cochin, Madhya Pradesh, and Bihar. 

Despite instruction to enumerators to insert zero on the census slip when- 
ever the woman was childless, the zero error did take place in the collected 
data. It was so large that, in the tables of Madhya Pradesh and Travancore- 
Cochin, the frequencies of the childless women and those for whom the num- 
ber of children was not given were pooled together in one category named “nil 
returns.” The tables for the state of Bihar were not published. 

The combination of the childless and the “not givens” will not enable us to 
estimate and adjust the childlessness error y. However, the existence of this 
error in the State of Madhya Pradesh can well be realized from Table 4, in 
which we notice that the proportions of the nil returns in that state were not 
only enormously high but they were four to six times as high as their corre- 


TABLE 4. PROPORTION (PER 10,000) OF “NIL RETURNS” AMONG 
EVER MARRIED WOMEN IN EACH AGE GROUP 


Travancore-Cochin and Madhya Pradesh, 1951 











Number of Nil Returns per 10,000 





Age group 
Travancore-Cochin Madhya Pradesh 





15-24 3191 4841 
25-34 593 2118 
35-44 367 1998 
45+ 332 2127 


Source: Census of India, paper No. 5, 1953, Maternity Data. 














922 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


spondents in Travancore-Cochin in ages above 25. The proportions show very 
clearly how serious the effect of deleting the “nil returns” could be on measures 
and conclusions derived from the published figures on parity. 

In an area where birth registration is incomplete, it is unfortunate that 
this error does not permit a reasonably accurate analysis of the census data on 
children ever born. Very little information on fertility differences can safely be 
derived from the published sub-tabulations by urban-rural residence and by 
occupational group. 


6.6. France (1946) and Egypt (1947): 


These two censuses furnish further examples of the occurrence of the child- 
lessness error. The points (C,, L,) show almost perfect linearity in both coun- 
tries, as one may see from figures 2(G) and 2(H). In France the childlessness 
error was as high as 37% and it accounted for practically all the “not givens.” 
In Egypt this error was still higher and equal to 44%. 

The parity statistics in France have been subject to some degree of uncer- 
tainty due to the large proportion of the “not givens” which existed in the 
consecutive censuses since 1901. The official commentator on the 1901 census 
has raised in his report the question as to whether the investigators were par- 
ticularly negligent in the case of childless women. The 1936 census report also 
indicates in connexion with the parity data that the hypothesis which corre- 
sponds best to reality is that all the persons who have not answered the ques- 
tion are actually childless.® 


Vincent [4] has indicated the existence of the zero error in the censuses of 
France between 1901 and 1936. He has also shown the existence in the 1931 
census data of the linear relationship between C, and L, calculated for single 
years of age between 20 and 30 and then used the fitted line to estimate the 
proportion of the actual “not givens” in that census. 


6.7. United States (1940): 


This census is one example of the cases where the C,, L, curve is non-linear 
and consequently unexplainable in terms of the linear model which exists when 
the variation in C below age 40 is caused by the mere erroneous tabulation of a 
certain proportion of childless women in the “number of children not given” 
category. 

It was realized from a scrutiny of the schedules of the “non-reports” that 
those cases had fewer children present in the household than the women who 
did report, which is the usual result when the “not-at-home” error exists among 
women with children. This, of course, does not mean that the coverage and the 
zero errors did not exist. In fact, it was found from an inspection of the popula- 
tion schedules for 1940 and 1950 that many of the non-reports were traceable to 
the relatively few enumerators who made no entries of children ever born for 
any of the women in their enumeration areas and thus apparently did not ask 
the question. The existence of the zero error was also observed, and a check 
box for childless cases was provided in 1950 for replies of “None” after the 1940 
census experience where some enumerators left the item blank instead of 





* Vincent [4]. 





ERRORS IN REPORTING CHILDLESS CASES 923 


entering a zero, despite specific instruction on this point [3, p. 5C-6] and 
[2, pp. 400-3]. 


7. SUMMARY AND CONCLUSIONS 


This article deals with an error that takes place during the process of data 
collection for children ever born in census operations. The error occurs when 
the enumerator gets the answer “None” and then fails to record it properly on 
the census schedule; he leaves the space blank or inserts a certain sign which 
he thinks is equivalent to zero. It also happens in a similar manner in self- 
enumeration. When the schedules reach the coder, he designates such cases as 
“not given” because he finds them different from the cases he has been in- 
structed to classify as zeros. 

The occurrence of the error has already been noticed by census statisticians 
in several parts of the world. In the few post-enumeration checks on this par- 
ticular point, of which the Malayan experience is one example, it was found 
that practically all the cases with dashes or with no entries were actually cases 
where a zero entry should have been made. There is also evidence in tabula- 
tions on parity in a number of censuses which indicates that childless women 
are sometimes erroneously tabulated as “not-given.” 

By studying the covariation of proportions tabulated as childless and as 
“not given” from one age to another it is found that in a large number of regions 
there exists a linear or nearly linear relationship between the two proportions. 
In censuses where other factors do not give rise to considerable bias in the 
proportion of women below 40 for whom age was reported but parity was 
not reported, the linear trend can well be explained in terms of a “childlessness 
error” whereby a certain proportion of childless women are erroneously tabu- 
lated as “not given.” This errer is actually a combination of the zero error and 
the other non-reports for childless women, particularly those who were not at 
home at the time of the interview and for whom someone reported at least the 
age but could not answer the parity question. The latter component, though 
sizeable enoughfin*some communities, "is’probably not significant in the rural 
and the not-so-urbanized areas with which most of the discussion here has been 
mainly concerned. Even in Japan the 1950 census data furnish evidence which 
shows that the second component could not have been significant and, conse- 
quently, that the childlessness error was mainly a zero error. 

The zero error can impair the accuracy of any study involving childless 
women; this would be the case for instance in the measurement of sterility or 
in the study of fertility differences between regions where the incidence of the 
error is significantly different. Such a pitfall, arising merely from failure to 
insert a zero properly on the schedule, is a technical fault which could be kept 
well under control by proper planning and execution of the data collection. 

One method which has already been tried is to have a special check box for 
answers of “none” on the schedule so that the enumerator or the respondent 
will not be tempted to leave the item blank. This method has not been found 
to be a complete cure, however, particularly in self-enumeration. Another 
possibility is to assign a specific mark for each of the zero and the “not given” 
entries and to instruct the field supervisors that the blank cases should be 





924 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


‘checked and properly marked before the schedules are dispatched from the 
area. In this case the enumerator should be trained to use the “not given” mark 
very sparingly. His work should be reviewed often enough during the enumera- 
tion period to make certain that he is not making an excessive use of such marks 
as the easy way out of situations where an answer is hard to get. 

However, when such data are already coded and tabulated, the method 
outlined here can be used, whenever there is a nearly linear relationship be- 
tween the proportions tabulated as childless and as “not given.” This linearity 
gives an estimate of the degree of incidence of the childlessness error and con- 
sequently enables us to make the data much nearer to reality. 

The method does not apply when there are other factors besides childlessness 
which cause considerable bias, as age changes, in the non-reports among women 
below 40 for whom at least the age was reported. This situation exists when the 
women tend to be away from their homes and their parity is not known to the 
neighbors or to the persons in the household whom the enumerator interviews. 
Some adjustment of the non-reports in this case can be carried out during 
the processing stage, on the basis of the information available on the census 
schedule. But there will still be the need, however, for a properly designed 
field check of the non-reports on parity as part of the post-enumeration pro- 
gram in order to look more deeply into the reasons for and the biases caused 
by the non-reports. Even in areas where there is good reason to believe that 
bias in non-reports among women below 40 for whom at, least the age was re- 
ported was mainly due to childlessness, it will still be advisable to include the 
parity non-reports in the post-enumeration check program. This will not merely 
ascertain the presumption but also measure and adjust the zero and other 
errors in various types of communities or localities. The results of such checks 
would have obviously been very useful in checking empirically the results 
arrived at in this article. 

The discussion has been confined to census data on the number of children 
ever born but the argument applies equally to all types of survey operation 
and to all items where a zero entry is required to be made by the interviewer or 
by the respondent. 


REFERENCES 


[1] Del Tufo, M. V., Malaya—A Report on the 1947 Census of Population. Kuala Lumpur, 
1949. 

[2] Grabill, Kiser, and Whelpton, The Fertility of American Women. New York: John 
Wiley and Sons, Inc., 1958. 

[3] 1950 United States Census of Population, Special Report Fertility. U. S. Department 
of Commerce, Bureau of the Census, 1955. 

[4] Vincent, Paul, “L’utilisation des statistiques des Familles,” Population 1 (1946), 
143-8. 





THE STATISTICAL ANALYSIS OF INDUSTRY STRUCTURE: 
AN APPLICATION TO FOOD INDUSTRIES! 


Les E. Preston anv Earu J. Be. 
University of California, Berkeley 


A stochastic model is used to analyze the changing size distributions 
of large firms in the food processing and chain store food distributing 
industries during 1948-1958. The model is shown to have certain pre- 
dictive usefulness for anticipating changes in these distributions; 
however, the predictive value of the equilibrium distribution is ques- 
tioned. An index of size mobility is developed from the model and the 
value of this index computed for each industry in each year and for the 
entire decade. The index clearly indicates that firms in the processing 
sector were characterized by greater relative size mobility than those 
in the distributing sector over this decade. A number of possible ex- 
planations of and deductions from the variability of the indexes, both 
between industries and over time, are suggested. Although no strong 
correlations are obtained, the descriptive and predictive value of the 
technique and the potential fruitfulness of further statistical work on 
the dynamics of industry structure are strongly indicated. 


1. INTRODUCTION 


N A previous article, Professor Irma Adelman suggested that the changes in 

the size distribution of a group of business firms over time might be ana- 
lyzed in terms of a stochastic model and that such a model might be used to 
generate (1) the size distribution which would be attained in long-run dy- 
namic equilibrium—that is, the steady state—and (2) an index which would 
describe the “size mobility” of the firms over the time period.* Application of 
the model was illustrated in an analysis of data for the steel industry over the 
years 1929-1939, 1946-1956. Unfortunately, the method employed in classi- 
fying the firms by asset sizes made it impossible to test the correspondence be- 
tween the actual behavior of the industry and the changing size distribution 
predicted by the model; further, the small number of values offered for the 
index of size mobility made it difficult to‘assess both its discriminatory power 
aud its analytical significance. In the present paper, Mrs. Adelman’s technique 
is slightly modified and applied to an analysis of changing sales volumes for 
the largest food processing firms and the largest chain food stores. This applica- 
tion yields some results of intrinsic interest to students of industry structure 
and also permits an evaluation of the predictive power of the model and the sig- 
nificance of the index of size mobility. 


2. DESCRIPTION OF THE DATA 


The data under analysis are annual sales figures for the years 1948-1958 for 
two groups of firms: (1) chain store food retailers with sales of more than 





1 Giannini Foundation Paper 209. 

2 The authors are, respectively, Assistant Professor of Business Administration and Research Assistant at the 
Operations Research Center, University of California, Berkeley. They are indebted to Professor Irma Adelman for a 
number of helpful comments and suggestions; she is, however, in no way responsible for any errors or differences 
in interpretation which remain. 

* Adelman, Irma G., “A Stochastic Analysis of the Size Distribution of Firms,” Journal of the American Sta- 
tistical Association, 53 (December, 1958), 893-904. 


925 








926 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


$50 million in 1958 and (2) the largest 25 food processing firms in 1958, as pre- 
sented in a recent report of the U. S. Federal Trade Commission.‘ The data 
follow these identical firms backward in time; thus, for 1948 we have, not the 
sales of the largest firms in that year, but the 1948 sales of firms which were 
large in 1958. This process of collecting data by jobbing backward would not 
be acceptable for all purposes nor for all industries; however, for the purpose 
of the present analysis and in the industries in question, it does not appear that 
any serious omissions result. Our data include the sales of firms which grew 
from small to large over the period but not those (if any) which declined from 
large to small, and the impact of entry and exit of firms from the distribution is 
explicitly excluded. Thus, we analyze the changing size distribution and pat- 
tern of mobility among a constant group of firms.§ 

The relative share of each firm in the total sales of each group in each of the 
ten years was computed, and firms were arrayed by size class according to the 
schedule shown in Table 1. The use of percentages of total group size as the 
basis for classification frees the analysis from the effects of changing price 
levels over time; this assumes, of course, that price-level changes affect all firms 
in the same manner. The year-to-year transition movements of firms from one 
size class to another were then tabulated, and ten annual transition matrices 


TABLE 1. LEADING FOOD CHAINS AND FOOD PROCESSING FIRMS, 
BY RELATIVE SALES SIZE, 1948 AND 1958 





Food chains Food processing firms 





Size class 1948 1958 1948 
(per cent 
of total Firms Per cent Firms Per cent Firms Per cent 
sales) ————| sales in |——————_ | sales in |————| sales in 
Num-| Per size Num-| Per size Num-| Per size 
ber | cent class ber class ber clasa 








1 4.0 
0 0 
40. 5 
28. of 
16. 6 
0 
0 


34. 
19. 
15. 
11. 
7. 
3 ll. 
26 |100. 


Qe 


SeoBeSs 





12. 
100. 





One De wh 


0 
0 
-0 
0 
0 


SESeoarw 


3 25 |100 


_ 
= 



































Source: Computed from U. 8. Federal Trade Commission, Economic Inquiry into Food Marketing, Part I (Wash- 
ington, 1960), pp. 76-7 and 94. 


for each industry were obtained; identical cells in each of these matrices were 
summed into a matrix showing the grand total of all transitional movements 
for each industry for the entire decade. Conversion of each row of this matrix 
to a relative frequency distribution summing to unity produces an empirical 
estimate of the year-to-year transition probabilities.* The resulting transition 
matrices are shown below. 





4U. 8. Federal Trade Commission, Economic Inquiry into Food Marketing, Part I (Washington, 1960), pp. 
76-7 and 94. 

* In a few instances, data for the chain retailers are missing in individual years; the transition movements of 
these firms have been included only for years in which sales sizes are observed. Hence, the number of observed 
firms increases from 26 in the first year to 33 in the last. 

* This summary matrix has been shown to be the “maximum likelihood matrix” derived from the annual 
transitional matrices. See T. W. Anderson, “Probability Models for Analyzing Time Changes in Attitudes,” Mathe- 
matical Thinking in the Social Sciences, ed. P. F. Lasarsfeld. Glencoe, Illinois: Free Press, 1954, pp. 17-66. 





STATISTICAL ANALYSIS OF INDUSTRY STRUCTURE 


Chain stores: 
.8950 .1050 0 0 0 
.0110 .9420 .0470 0O 0 
P=] 0 .0640 .8070 .1290 0 
0 0 .0210 .9370 .0420 
0 0 0 .0190 .9810. 


Food processors: 

.2500 0 0 0 

.3333 .6667 0 0 

.0165 .7540 .2295 0 

0 -0342 .9543 .0085 

0 0 .0270 .9460 .0270 
iS 0 0 0 .0357 = .9643_ 








3. PREDICTION FROM THE MODEL 


The previous application of this model attempted prediction only in terms 
of the steady state of the transition matrix, a state described as corresponding 
to the size distribution of firms in the industry in long-run dynamic equilibrium. 
In the present instance, however, it is possible to test tlie accuracy of the model 
in predicting the behavior of the distribution over the observed time interval 
and to assess the relevance of the equilibrium distribution as a “prediction” of 
the future state of the industry. 

The predictive accuracy of the model may be examined directly by compar- 
ing the actual distributions of firms by size class in 1958 with those which would 
be generated from the operation of the transition matrices upon the 1948 dis- 
tributions. In Table 2, observed relative frequency distributions for 1958 are 
compared with the distributions generated by the model. The correspondence 
between the actual and expected distributions for each industry in 1958 is 
striking, and a chi-square test indicates that this correspondence is statistically 
significant.’ Strictly speaking, it would be desirable to test the predictive ac- 
curacy of the matrix against the actual size distributions for 1959 or any later 
year not included in the data from which the transition matrix was generated. 
Unfortunately, exactly comparable data for subsequent years are not available 
at the time of this writing. With the transition matrices and distributions here 
presented, however, it is possible to test any subsequent set of comparable data 
that becomes available. 

The size distributions of the food chains and processors have been projected 
forward to 1968 and to equilibrium from the 1948 base, and these projections 
are also presented in Table 2. For the food chains, the difference between the 
projected 1958 and 1968 distributions is significant at the 10 per cent level, 
and the difference between 1968 and equilibrium is self-evident. Obviously, the 
continuation of the process of size shifting described by the matrix would 
increase significantly the percentage of food chains in the larger size classes. 
For the food processors, the differences between projections for 1958, 1968, and 





1 For 33 chains and 25 processors, computed values of chi-square are .6396 and 1.7368 with four and five degrees 
of freedom, respectively. The probabilities of a chi-square variate assuming values less than or equal to these values 
are approximately .04 and .12, respectively. Projections were also made from a matrix constructed according to 
different size class limits, with very similar results. 





928 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


TABLE 2. ACTUAL AND PROJECTED RELATIVE FREQUENCY DIS- 
TRIBUTIONS OF FIRMS, FOOD CHAINS AND FOOD PROCESSORS 











| Food chains Food processors 





Size class | Ac Projected Projected 








_ Equi- Equi- 
1958 librium 1968 | librium 














a) 
-~0.5 15.: 


).99 39.: 
1.99 9. 
)-3 .99 

7.99 | 9. 
and over) 9. 


0. 
ai 
63. 
13. 
11.¢ 


— 
oo 
em te wb 





ae 














0 
0. 
Be 
2 .( 
4. 
8.( 


5- 

6 

0- 
0: 


* Leas than 0.1 per cent. 
> For food chains, the two largest size classes must be combined to obtain a regular matrix of transition prob- 
abilities. 


equilibrium, although noticeable, are not statistically significant. In both in- 
dustries, it should be noted that the exclusion of entry produces a downward 
bias in the lower size classes and hence exaggerates somewhat the tendency 
toward increasing concentration in the larger classes. 

The fact that obvious differences may be noted between the 1968 and equi- 
librium projections for the food chains suggests that the distribution is ap- 
proaching equilibrium only slowly, and this suspicion is confirmed by an in- 
spection of the eigenvalues of the matrix. Of course, different portions of the 
distribution are approaching equilibrium at different rates. Under these cir- 
cumstances (which, incidentally, apply to Mrs. Adelman’s application as well), 
the relevance of the steady state of the distribution for predictive purposes 
seems greatly reduced. It would appear highly unlikely that the transition 
probabilities, or even the relevant boundaries of the industries, would remain 
roughly stable over the required periods of more than a century. The implica- 
tion of this finding for the validity of the static equilibrium approach which has 
characterized most of the market models of micro-economics is obvious and 
has been spelled out in detail by Wolfe and Newman.* The response of these 
authors has been an attempt to reformulate micro-economic theory in terms of 
dynamic equilibrium; however, it may be that a more apt description would be 
cast in terms of a process analysis in which the concept of equilibrium plays 
only the role of “absolute zero,” a state useful as a reference point but rarely 
observed outside the laboratory. 


4. MOBILITY INDEX 


If economic processes are to be characterized in terms of transitional states 
rather than equilibrium or equilibrium-directed conditions, interest shifts to 





8 Newman, P., “The Erosion of Marshall’s Theory of Value,” Quarterly Journal of Economics, LXXIV (No- 
vember, 1960), 587-601; Wolfe, J. N. and Newman, “An Essay in the Theory of Value.” (Processed.) 





STATISTICAL ANALYSIS OF INDUSTRY STRUCTURE 929 


the parameters of the process of change and to indicators of the rate and char- 
acter of changes taking place. One such indicator is the index of mobility devel- 
oped by Mrs. Adelman, following the original suggestion of Dr. Prais.® The 
Adelman index relates the mobility of the observed distribution to that of a 
“perfectly mobile” distribution with the same steady state. A “perfectly mo- 
bile” distribution is defined as “one for which the probability that a firm will 
move from class A to class B during a single period is independent of A. With 
this definition, each column of the transition matrix P for a perfectly mobile 
industry of m size classes is composed of m identical positive numbers, and as 
usual, the sum of the elements of each row is unity.”!® The theoretical stand- 
ard against which the observed data are to be compared is thus a matrix, each 
row of which corresponds to the steady state of the distribution. 

The observations offered above as to the predictive relevance of the equi- 
librium distribution suggest that this distribution may not be a relevant basis 
for assessment of the mobility of the observed industry. As an alternative, the 
initial configuration of the distribution, conceived as a perfectly mobile distri- 
bution in the sense described above, may be substituted in the index. Further, 
the value of the index for each size class may be weighted by the share of total 
size occurring in that class; that is, the significance of the mobility recorded for 
firms in each class may be treated as proportionate to the initial share of those 
firms in the total size of the group. Using Mrs. Adelman’s symbols, the revised 
index for time n may be written as: 


m 1 — a - 
I* ia > ( Pa wi 


j=0 1 — 83 
where p,;; is the probability of a firm in each size class remaining in that class, 
s; the relative frequency of firms in each class at time n, and w; the share of 
each class in the total size of the group at time n.™ 

Using this formula, an index of size mobility has been computed for the food 
chains and food processors for each of the years under study, and an average 
index for the period has been computed using mean relative frequencies of 
firms and sales share in each size class ($; and #,;). The results are presented in 
Table 3. Evidently, the index discriminates clearly between the amount of size 
mobility experienced by each industry in each year and also discriminates 
between the mobility of the two industries over the entire period. The food 
processing firms have experienced approximately 8.5 per cent of the mobility 
which would have characterized a “perfectly mobile” industry of like configura- 
tion; and the food chains, somewhat more than 4 per cent. 

The relative variability of the two sets of annual indexes may be more readily 
compared when each index is expressed as a per cent of the relevant weighted 





* Prais, S. J., “Measuring Social Mobility,” Journal of the Royal Statistical Society, Series A, 118 (July, 1955), 
56-66. 
40 Adelman, op, cit., p. 897. 

'! The index in this form and the general approach of the Adelman model have been used in another paper. 
See N. R. Collins and L. E. Preston, “The Structure of Food Processing Industries, 1935-1955,” Journal of Industrial 
Economics, vol. 9, no. 4 (July, 1961), 265-79. The results offered there are not comparable with those obtained 
here because of differences in the industry definition and size measure employed and in the number of firms and 
length of time periods under consideration. 





AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


TABLE 3. INDEXES OF SIZE MOBILITY FOR FOOD CHAINS 
AND FOOD PROCESSING FIRMS 





Chains Processors 





Period ae Per cent pa Per cent 
Mobility weighted Mobility weighted 


index index 
average average 





1948-49 .0326 75.3 .0258 30. 
1949-50 .0164 38. .2141 250. 
1950-51 .0307 rr .0319 37. 
1951-52 .0498 115. .1181 138. 
1952-53 .0061 14. . 1053 123. 
1953-54 .0226 52. -0296 34. 
1954-55 . 1832 426. .0618 72. 
1955-56 .0408 94.§ .0690 80. 
1956-57 .0171 39. .1301 152. 
1957-58 .0635 147. -0000 0 
1948-1958 .0430 | 100. -0854 100.0 


oOomorn WW Rr “I DO 























average for the decade, and these figures are also presented in Table 3. The 
deviations from the weighted averages are in the same directions for both 
industries in five of the years and in opposite directions in five. Although the 
chains show relatively high mobility during two recession periods (1954-55 
and 1957-58), both industries show high mobility during the boom year 1951- 


52; hence, no evident relationship between mobility and the level of economic 
activity is suggested. Correlations between the annual mobility indexes and 
annual profit rates, changes in profit rates, and changes in sales also fail to 
reveal significant results. It may well be, of course, that significant correla- 
tions would be obtained from a larger collection of data. 


5. SIGNIFICANCE OF FINDINGS 


Without overestimating the substantive importance of these findings, cer- 
tain observations may be made. During the period under analysis, the chain 
stores included in the study increased in numbers from 26 to 33 and in share 
of total food stores sales from 24.6 per cent to 33.5 per cent. Most of this in- 
crease was due to the growth of the original 26 companies, which accounted for 
32.9 per cent of total sales in the later year. The number of food processors was 
held at 25, and there is no evidence that these firms as a group altered their 
share of total sales of the food processing industry significantly over the pe- 
riod.'? Although there has been only a slight increase in the degree of equality 
of size among the chains over the decade, the increasing concentration of the 
food processing firms in the 2-to-3.99-per-cent-of-sales size class has resulted 
in a considerable reduction in the dispersion of the processing firms’ size dis- 
tribution and an increase in the degree of equality. However, if the projected 
tendencies of the distributions are even approximately correct—that is, if the 
transitional movements of the next decade are roughly similar to those in the 








1 U. 8. Federal Trade Commission, op. cit., p. 93. This observation is supported by the larger study by Collins 
and Preston, op. cit. 





STATISTICAL ANALYSIS OF INDUSTRY STRUCTURE 931 


past—there is no reason to expect dramatic shifts in either of the distributions. 
This expectation takes no account, of course, of the impact of new firms or of 
growing small firms upon the industries in question. 

The relatively slight changes noted and immediately anticipated in the size 
distribution of the food chains and the low value of the mobility index obtained 
are at some variance with the general tenor of the FTC report, which implies 
that significant increases in concentration are occurring in food retailing, es- 
pecially as a result of the merger activity of the larger firms. The implication 
of the present analysis is that, although the larger firms as a group have grown 
relative to the industry, the relative size distribution of these firms has changed 
but little, nor has there been an unusually large amount of turnover in relative 
size among them. In the one period (1954-55) for which the mobility index for 
the food chains reached an unusually high value, only one of the firms (Winn- 
Dixie) experiencing a significant change in relative size was also involved in 
mergers. Although 1955 was the peak year of merger activity in terms of num- 
ber of stores acquired, the peak year of sales volume acquired through merger 
was not reached until 1958. Inspection of these data suggests the observation of 
Hart and Prais, based upon a half century of British experience, that amalga- 
mations play a direct role considerably less important than natural growth in 
explaining relative size changes among large firms.'* Of course, the indirect 
effects of amalgamations upon the growth process are more difficult to assess. 

Certain disparities between the distributing and processing sectors of the 
food industries emerge rather clearly from the analysis. The food processors 
are clearly the more mobile group of firms; their size distribution is, however, 
more stable in over-all configuration and much nearer the equilibrium projec- 
tion than that of the food chains. Further, the processing industry has been 
generally less profitable and has experienced a less rapid growth in sales over 
the period. Hence, the processors show higher internal mobility, greater over- 
all stability, and lower profits as compared to the lower internal mobility, lower 
over-all stability, and higher profits of the food distributors. If profitability be 
accepted as a crude index of the strength of competition, it may be suggested 
that the more mobile industry is also the more competitive. It should also be 
noted that there is no obvious connection between the degree of internal mo- 
bility and the shape of the size distribution of firms, although long-run tenden- 
cies toward change in the size distribution are much more pronounced in the 
less mobile of the two industries under consideration here. 

The limited results obtained here perhaps raise more questions than they 
answer. However, these results serve, at minimum, to illustrate the descriptive 
and predictive usefulness of the technique offered by Mrs. Adelman and to sug- 
gest a number of hypotheses meriting investigation on a larger scale. This 
application of statistical analysis to changes in industry structure indicates the 
availability of a number of useful techniques for the detection of new and sig- 
nificant characteristics of a dynamic economy. 





% P, E. Hart and Prais, “The Analysis of Business Concentration: A Statistical Approach,” Journal of the 
Royal Statistical Society, Series A, 119 (October, 1956), 150-75. 

™ For the decade 1948-1958, profits before taxes for the food chains were 14.3 per cent of the total assets and 
25.6 per cent of stockholders investment; for the processors, the comparable figures are 12.3 per cent and 19.5 per 
cent. The largest food chains increased their sales 124 per cent during the period, and sales of the largest processors 
increased only 40 per cent. U. 8. Federal Trade Commission, op. cit., pp. 75-100. 





932 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


BIBLIOGRAPHY 


Adelman, Irma G., “A Stochastic Analysis of the Size Distribution of Firms,” Journal of 
the American Statistical Association, vol. 53, no. 284 (December, 1958), pp. 893-904. 
Anderson, T. W., “Probability Models for Analyzing Time Changes in Attitudes,” Mathe- 
matical Thinking in the Social Sciences. Edited by P. F. Lazarsfeld. Glencoe, Illinois: 

Free Press, 1954. 

‘Collins, N. R. and Preston, L. E., “The Structure of Food Processing Industries, 1935- 
1955,” Journal of Industrial Economics, vol. 9, no. 4 (July, 1961), pp. 265-79. 

Hart, P. E. and Prais, 8. J., “The Analysis of Business Concentration: A Statistical 
Approach,” Journal of the Royal Statistical Society, Series A, vol. 119, no. 4 (October, 
1956), pp. 150-75. 

Newman, P., “The Erosion of Marshall’s Theory of Value,” Quarterly Journal of Economics, 
vol. LXXIV, no. 4 (November, 1960), pp. 587-601. 

Prais, 8. J., “Measuring Social Mobility,” Journal of the Royal Statistical Society, Series 
A, vol. 118, no. 3 (July, 1955), pp. 56-66. 

U. S. Federal Trade Commission, Economic Inquiry into Food Marketing, Part I. Wash- 
ington, 1960. 

Wolfe, J. N. and Newman, P., “An Essay in the Theory of Value.” (Processed.) 





NOTE ON THE MISSING PLOT PROCEDURE 
IN A RANDOMIZED BLOCK DESIGN 


Joun Leroy Fouxs 
Oklahoma State University 
AND 
Det Lon West 
Texas Instruments Incorporated 


The properties of the missing plot procedure for a randomized block 
design are investigated from the randomization viewpoint. Expected 
mean squares, estimates of treatment effects, variances, and estimates 
of treatment effect variances are obtained when a particular block-plot 
combination is missing from the conceptual population of yields, when a 
block-treatment combination is missing, and when a restricted set of 
randomizations is considered. With proper definition of parameters 
and with certain assumptions, the analysis of variance is shown in 
all cases to lead to an unbiased test. Comparison with the infinite model 
approach is stressed. 


1, INTRODUCTION 


N A case where an observation is missing from the results of a randomized 

block design, the “exact” missing plot procedure is well known (see Ref- 
erences) regarding the analysis of variance, estimates of treatment effects, and 
variances of treatment effect estimates. However, when one considers the 
randomization basis for the procedure, certain questions arise. Should one 
consider an observation missing in every conceptual experiment in the con- 
ceptual population of yields and, if so, which observation should it be? From 
the viewpoint of the practicing statistician it may be impossible to specify the 
most desirable conceptual population; however, it is felt important to examine 
various cases to determine which most nearly coincides with the infinite model 
approach. The following cases are considered in this paper. 

1. A block-plot combination missing. 

2. A block-treatment combination missing. 

3. A block-plot combination missing with conditional expectations. 


2. THE RANDOMIZATION ANALYSIS OF A RANDOMIZED BLOCK EXPERIMENT 


The material of this section is covered elsewhere (Kempthorne [2]) but is 
included in an effort to make the paper self-contained. Suppose that s treat- 
ments are to be applied at random to each of r blocks containing s experimental 
units, or plots. There are a total of (s!)" possible ways in which the treatments 
might be assigned and for each arrangement there is a conceptual set of ex- 
perimental results. Denote the conceptual yield of treatment k on the jth plot 
in the ith block by 2z,;,. The contributing factors to 2;;, are emphasized by con- 
sidering the identity 


Zin = 2... + (2j.. — 2...) + (ie — 243.) + (iz. — 2%...) 
ut bitte tex (1) 
933 





934 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


where 
te = 2izn — 2y-, 
and 


Cij = Zi. — B%.. 


and where the dots indicate averages over the respective subscripts. Note that 
the definition of t assumes additivity of treatment and experimental units; that 
is 2:j.-~2;;. does not depend upon 7 and j. 

When a randomized experiment is performed, only a subset of the z;;z.’s are 
observed, the particular subset being determined by the randomization. Let 
8, =1 if treatment k is assigned to plot j, of the ith block and &,=0 otherwise. 
Then the observed yields are given by 


k 
Viz py 6 ijZije 
i 


=ptbt+het+ ye Bist (2) 
j 


Note that y« is a random variable in that it contains the random variable 
8. By considering the distribution of the deltas, it can be shown that 


E(y.n — yx) =i — ty 


and that the average variance of treatment differences is given by 


———~ > « Cij- 


r? = - r?(s — 1) FF 
Likewise, one can find the expectations in the analysis of variance. Thus it 


can be seen that the A.O.V. gives an unbiased estimate of error in that the 
expectation of 2/r (Error mean — is 


——— ¥ ¢, ij 
r? > ag - ly 
which is the average variance of treatment differences. Note also that under 
the null hypothesis of equal treatment effects, the expected treatment mean 
square equals the expected error mean square. These expected mean squares 
are given in Table 1 
TABLE 1. ANALYSIS OF VABIANCE— RANDOMIZED BLOCK 








Source & Expected Mean Squares 








Blocks 


Treatments de ej + 
a —- r(s — 1) rF 


Error (r — 1)(8 — 1) = = X eis 
sj 





Total rs — 1 











MISSING PLOTS IN RANDOMIZED BLOCKS 935 


Some important principles derived from looking at the results of an experi- 
ment from the randomization perspective are: 


. The conceptual rs? yields are fixed quantities, not random variables. 
. The random variables are introduced by the randomization itself. 
3. The parameters of the model are defined in terms of the finite population 
of conceptual yields. 
. The assumption of additivity is stronger than the assumption of no block- 
treatment interaction. 


3. BLOCK-PLOT COMBINATION MISSING 


From some points of view, this case is the most realistic of the three discussed 
in this paper. For example, in animal experiments there may be a particular 
animal which is “fated” to die before the experiment terminates, or in agricul- 
tural field experiments, there may be a particular plot which is flooded and 
never gives a yield. However, if this is the situation envisaged, the treatments 
would occur on this plot with equal frequency over the conceptual population 
of randomizations, and all treatment differences would be estimated with the 
same variance. Thus the variances of treatment differences as derived from 
infinite model theory would be improper. 

Consider the conceptual population of yields z,;, where all yields on a particu- 
lar block-plot combination are missing. Without loss of generality, take these 
to be z1u, K=1,2,---,s. 

One of the principles emphasized in the preceding section was that the 
parameters of the model were defined in terms of the population of conceptual 
yields. In the present instance, the original population has been truncated; 
thus, we cannot use the parameter definitions given in equation (1). The fol- 
lowing set of definitions is one way of defining parameters which makes possible 
useful results. As before, let a dot indicate averaging over the appropriate sub- 
script with a prime when j=1 is excluded. Let 


n [a+ Dae.|/+ 


’ , 
bi = 2Z1..— fh 
/ . 
b = 2... — pb t1#1 
’ ° 
Cay = 2iz- — B.. 1#1 
’ , 
Cy = 215 — 23.. 
ty = Zijk — 24j- 


Then 


Zin = +b: +t + ey. 


Further, with the above model 
Se ee Hid wt 
i cs j 


The observed yields y are given by 








AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


k > 
Ya = , Bi;2ijn #1 
j 


ky 


=p +b+h+ > de; 
7 


utbit+ht+as 


k 
6118 by Qik 
' k isl 
Yue =e +01 + te + DD die1; +——-———_ - (5) 
jel (r — 1)(s — 1) 
By consideration of the distribution of the deltas expected values may be ob- 
tained. 
Define intrablock variances by 
2 72 
oi = 2) e1;/(8 — 2) 


j#l1 


> > e,/(8 — 1) i~l 
2 


¢ = p> o,/r. 
. 


The expectations in the analysis of variance are as given in Table 2. Under the 
assumption of homogeneous error variances; i.e. 
2 : 
=o F120 eer, 
the test for treatments is unbiased in the sense that Z(Treatment Mean Square) 
= E(Error Mean Square) under the null hypothesis that ,=0, k=1, 2, - - - s. 


TABLE 2, ANALYSIS OF VARIANCE FOR BLOCK-PLOT 
COMBINATION MISSING 








Source Expected Mean Square 








(@-2) 2, yo 
(s — 1) 


Treatments 





rs —-r—l 
Gope-) te@-p ¥! 
=) 2 
( - Ie - 2) ae — 


Error , 7 
mw-rf-8 (r — l1)(rs — r — 8) 














2 r 
r \s mr — Xs — af 





MISSING PLOTS IN RANDOMIZED BLOCKS 


and with homogeneous error variances is estimated unbiasedly by 





2 
— (Error Mean Square) (1 + 
, 


(r — 1)(s — 5): 


It should be emphasized that the population of conceptual yields excludes 211., 
k=1, 2, - - » 8, and that the model is defined in terms of this incomplete popula- 
tion. Consequently, the term e:;; does not appear in the model or in any of the 
expectations. The problem could be formulated in a different manner, however. 
One could consider all conceptual yields to be defined, but certain yields miss- 
ing in every conceptual experiment. This, in fact, is the approach taken by 
Mitra [4]. The expected values obtained are identical in the two cases, but 
look different because they are given in terms of different models. 


4. BLOCK-TREATMENT COMBINATION MISSING 


Upon first examination, this case may appear to many to be unrealistic. How- 
ever, it is conceivable that the physical properties of a certain block are such 
that a particular treatment will always result in a missing yield. Also it should 
be noted that this case is worthy of investigation because the infinite model ap- 
proach is concerned with a block-treatment combination missing. 

Consider the conceptual population of yields z;;,. Without loss of generality 
suppose 21,1 always missing. We wish to define our model in terms of the yields 
which are available to us. As noted previously, there are many ways in which 
this can be done. The following is one which appears to be useful. 

Denote sums and averages, excluding k=1, by an asterisk. 

Let 


= 2... — Z... 
* t 
= 2ij. — 2%.. 


= Zijk — 24;.- 
* * * * 
Zing =e +O +h + eis 


er Sb ee 
j é 


kel 


In this case all ef are defined. Let 
2 2 
a = Les/(e— 1) 
j 
2 
o . g,/P. 


The expectations in the analysis of variance are given in Table 3. 








AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


TABLE 3. ANALYSIS OF VARIANCE FOR BLOCK-TREATMENT 
COMBINATION MISSING 





Source | Expected Mean Square 





- i ’ ak 4 1 | ff 
reatments o (r — 1)(8 — 1) (r — 1)(s — 1) 


r *2 
. + = wi 
o ((r —1)(s —1) —1] —@ (r — 2) 
(r — 1)(rs — r — 8) 














Under assumption of homogeneity of error; i.e. 
2 2 2 
oy =o = =o; 

r s+r-l1 


E[Treatment (M8)] = o** + > 4 - | |e" (10) 
s—1% s(s — 1) 


E|Error (MS)] = o*’. (11) 


Thus with homogeneity of error the test is unbiased. Further with homogeneity 
of error 


ir ei 7 
Var (¢,* — t,*) = 20*'/r, p, q #1 (12) 





\ - “|- s 
Tar (t;* — t,*) = oh — |, q+ 1. 13 
1 q o : rr — 1)(s — 1) fi ( ) 


Consider the infinite model 


Yy=Utb+ht ex 


where 


> b; =0 


> & =0 
b 
and 
ex ~ NID(O, o?). 


Suppose that y is missing. Then it can be shown that 


° 
E[Treatment (MS)] = € tem F 6 - —— ty 
s—-1; (s — 1)? 





MISSING PLOTS IN RANDOMIZED BLOCKS 939 


E[Error (MS)]} = o°. (15) 
Further, it can be shown that 


my 
Var (tp — t,) = 20?/r, p, q#l (16) 


and 





int outs 2 8 
Var (t; — t,) = o?| — + , (17) 
r rr — 1)(s — 1) 
Thus the randomization results of this section are in agreement with the vari- 
ance formulas derived from the infinite model approach. 


5. BLOCK-PLOT COMBINATION MISSING——-CONDITIONAL EXPECTATIONS 


It has been suggested by some that we should condition our analysis upon 
the fact that a particular treatment did occur on a particular plot. This, of 
course, amounts to analyzing our experiment in terms of a restricted set of pos- 
sible randomizations while the two preceding cases have not. As might be ex- 
pected, the expectations and variances over this restricted set of randomiza- 
tions are very similar to those in the preceding section. 

Using the model defined in equation (3) and intrablock variances as in equa- 
tion (6), expectations are given in Table 4. Note that this differs from Table 2 
only in the treatment portion of the treatment mean square. Hence with ho- 
mogeneous error variances, the test is unbiased. With homogeneity of error 
variances, the variances of treatment differences are as given in equation (12) 
and (13) although o? is defined differently in that section than it is here. Thus 
the randomization results of this section agree completely with the results from 
the infinite model as given in equations (14), (15), (16), and (17). 


TABLE 4. CONDITIONAL ANALYSIS OF VARIANCE FOR BLOCK-PLOT 
COMBINATION MISSING 











Source Expected Mean Square 


Le 


(s —2) 2 ixl r 2 8 
Treatment ‘aie 1? + ¢-De-tD +: oe > t i 


2 
(r —1)(8 —2)e° 2 eilr — 2) 


ria—r—s (r — 1)(rs — r — 8) 











Error 








6. DISCUSSION 


From most viewpoints a probability statement about an experimental result 
implies the concept of repeating the experiment many times. It was felt that it 
was rather vague as to which conceptual population of experiments the missing 
plot procedure was to apply. Additivity of treatment and experimental unit ef- 
fects has been assumed throughout this paper. The randomization analysis with 
non-additivity has been explored in the case of no missing data. The effect of 
non-additivity upon the missing plot procedure is under investigation. 





940 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


7. APPENDIX 


The variance formulas and expected mean squares in this paper are obtained 
by finding the expectation of quadratic forms in the deltas. However, they can 
be obtained more readily by following the elegant approach used by Mitra [4] 
which we shall describe here for the case of a block-treatment combination miss- 
ing. 

Suppose that the block-treatment combination (1, 1) is missing and that we 
denote its estimate obtained from the missing-plot formula by z. Consider the 
expected sums of squares in the analysis of variance when the data are com- 
plete, when the data are augmented by z, and in the exact case. Denote these 
expected sums of squares shown in Table 5. 


TABLE 5. NOTATION FOR EXPECTED SUMS OF SQUARES 








Complete Augmented Exact 





Total (Total), (Total). (Total), 


Blocks (B). (B)e (B). 
Treatments (Te (T)a (T). 
Error (2). (E)e (E). 














It can be shown that 


(rs — 1) 
(Total), = (Total), — - E(x — yu)? + 2E (x — yu) (yu — y--) 


~ 1 
(B). = (B)e + — E(a — yu)? + 2E(@ — yu) (ys. — y--) 


—1 
(T). = (T)e + — E(x — yu)? + 2E(@ — yu)(ya — y.)- 


mt ins 
(in. =m. «eh. <= 
rs 


E(z —_ Yu)*. 
Since (Z), is already known, (F),=(£), is readily found to be as given in Table 3. 
Also 


(Total), — (B), = (Total). — (B). — s—1 ps — y;.)2 
8 





s-1l1 
and 7’, is readily found to be as given in Table 3. 


REFERENCES 


[1] Cochran, William G. and Cox, Gertrude M., Experimental Designs, Second Edition, 
New York: John Wiley and Sons, Inc., 1957. 





MISSING PLOTS IN RANDOMIZED BLOCKS 941 


[2] Kempthorne, Oscar, “The Randomization Analysis of Experimental Inference,” 
Journal of the American Statistical Association, 50, (1955) 946-67. 

[3] Kempthorne, Oscar, The Design and Analysis of Experiments. New York, New York: 
John Wiley and Sons, Inc., 1952. 

[4] Mitra, Sujit Kumar, “Some Remarks on the Missing Plot Analysis,” Sankhya, 21 
(1959) 337-44. 

[5] Yates, Frank, “The Analysis of Replicated Experiments When the Field Results are 
Incomplete.” Imperial Journal of Experimental Agriculture, 1 (1933) 129-42. 





GAMMA DISTRIBUTION IN ACCEPTANCE 
SAMPLING BASED ON LIFE TESTS 


SHantr 8. Gupta! anp Puyruurs A. Grou? 
Bell Telephone Laboratories, Inc. 


The gamma distribution is assumed as a model for lifetime. The 
problem of acceptance sampling when the life test is truncated at a pre- 
assigned time is discussed. For various acceptance numbers, various 
confidence levels, and various values of the ratio of the fixed experiment 
time to the specified mean life, the tables of this paper give the mini- 
mum sample sizes necessary to assure the specified mean life. The op- 
erating characteristic functions of the sampling plans are obtained and 
these are graphed for a few selected cases. Producer’s risk is discussed 
and a small table is given for the ratio of true mean life to a specified 
mean life which insures acceptance with a probability of P* =.95. An 
approximation is given for the minimum sample size problem. Failure 
rates for the gamma distribution are tabulated. The use of the tables 
is illustrated by examples in most cases. 


1. INTRODUCTION 


PART from its theoretical interest the gamma (Pearson Type III) dis- 
A tribution has been studied in connection with some reliability, life test 
and fatigue test problems. The availability of the tables of the probability 
integral and the percentage points of the gamma distribution in the tables 
edited by Pearson [16] and by Pearson and Hartley [15, Tables 7 and 8] has 
facilitated the use of this distribution in connection with the fitting and 
graduation of skew data. The tables of the x? distribution (a special case of the 
gamma distribution) are included in most standard statistical texts and tables. 
The percentage points of the gamma distribution for integer values of the 
shape parameter can also be found in a paper by G. A. Campbell [2] which 
was published in 1923 and which is perhaps the first tabulation of the inverse 
gamma function. Quantiles of the gamma distribution have also been recently. 
tabulated by Wilk, Gnanadesikan, and Huyett [18] who also discuss prob- 
ability plots for this family of distributions. 

The gamma distribution which includes the x? and exponential distribution 
as special cases is derived by Birnbaum and Saunders [1] as one of the statis- 
tical models for the life-length of materials. The use of this distribution as 
models for some reliability problems is mentioned by Drenick [5] and Herd 
[12]. Methods for estimating the parameters of the gamma distribution are 
discussed by Chapman [3], Greenwood and Durand [9], Wilk, Gnanadesikan, 
and Huyett [19] and Gupta [10]. Order statistics from the gamma distribu- 
tion together with some application are discussed by Gupta [10]. 

In the present paper it is assumed that the probability distribution of life 
is a gamma distribution and that its shape parameter r is known. The problem 
considered is that of finding the minimum sample size necessary to assure a 
certain mean life when the life test is truncated at a preassigned time ¢ and 





1 Currently visiting the Department of Statistics, Stanford University, Palo Alto, California. 
2 Presently at the Western Development Laboratories of Phileo Corporation, Palo Alto, California. 


942 





GAMMA DISTRIBUTION IN ACCEPTANCE SAMPLING 943 


when the observed number of failures does not exceed a given acceptance 
number. The decision procedure is to accept a lot only if the specified mean 
life can be established with a preassigned (high) probability P*, which pro- 
vides the consumer’s protection. It should be noted that the decision to accept 
can take place only at the end of time ¢ and only if the number of failures does 
not exceed the given acceptance number c. The life test experiment can be 
discontinued at the time the (c+1)th failure is observed or at time ¢ whichever 
of the two is earlier. In the first case the only decision that can be arrived at 
is to reject the lot. Truncated life tests of this type under the assumption of 
exponential lifetime distribution are discussed by Epstein [6] and by Sobel 
and Tischendorf [17]. Sobel and Tischendorf [17] developed the sampling 
plans based on life tests for the exponential distribution and the present paper 
generalizes these to the gamma distribution. In a recent paper Goode and Kao 
[8] have applied the Sobel and Tischendorf procedure to the Weibull distribu- 
tion. 

Tables IA, B, C, D and E at the end of this paper give the minimum sample 
sizes necessary for various acceptance numbers ¢, for various confidence levels 
P* and for various ratios of the test time ¢ to the specified mean life r@. In 
Section 2, we state some known properties of the gamma distribution and dis- 
cuss briefly the estimation of the parameters. In Section 3, we have obtained 
the minimum sample sizes and discussed the operating characteristic functions 
of the corresponding acceptance sampling plans. Section 4 describes an ap 
proximation for the minimum sample sizes. The approximate sample size can 
be obtained by Table II at the end of the paper. Failure rates are discussed in 
Section 5 and Table IV provides these failure rates. Along with description of 
the tables accompanying the paper, some illustrations are given in Section 6. 


2. CHARACTERISTICS OF GAMMA DISTRIBUTION AS A MODEL FOR LIFE 


The gamma distribution as a two-parameter family of distributions has the 
probability density function 


1 t r—1 
r(t, 0) = ett} 0<t<~,6>0,r>0. 2.1 
ar(t, 6) 6 (r) (=) ah 


[A three-parameter gamma distribution involves an origin parameter a its 
density function is obtained from (2.1) by replacing t by (t—a).] It should be 
pointed out that the random variable 2¢/@ in (2.1) has a central x? distribu- 
tion with 2r degrees of freedom. The cumulative distribution function (c-d-f) 
is given by 


G,(t, @) = f ole, 0)dx (2.2) 


which reduces to 


G6 =1-~ oO 


, if r is an integer 
j=0 jl 





944 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 
and to 
eae V/t/0 e-*/ r—1/2 Dizi] 
G,(t, 0) = erf 4/6 — ——_=— balsa , 
Vr jt 1-3+-- Qj-1) 


if r is a half integer 





where erf (x) is the error function 

2 z 

“a e~v'dy 

rT Jo 
and where the second term on the right side of (2.4) is equal to zero for r= 4. 
For the case when r is an integer G,(t, 0) can be obtained from a table of 
Poisson’s cumulative distribution [13]. The density functions are sketched for 
a few values of r (Figure I). For r=1, the gamma distribution reduces to 
the exponential. For r>1, the density functions represent positively skewed 
curves which start at the origin and which tend to the normal distribution 
as r tends to infinity. For r<1, the axes are asymptotes to the density curves. 
The model value is given by t= (r—1) @ for r>1. The moments of the distribu- 
tion are 


ba = Et® = 6*(a+r—1)\(a+r—2)--:- (2.5) 


so that 
wi = Mean = 79, po = Vart = ré? 


2.6 
ps = 2ré3, wa = 3r(r + 2)04 — 


which give for the measures y; and yz of skewness and kurtosis (excess), re- 
spectively, 
2 6 
n1=-=) a o's 
: Vr r 
Both y; and y,—0 as r—> ©, indicating that the skewness and kurtosis (excess) 
decrease to the values of the normal distribution. 
If y,,p* is the P* percentage point of the standardized gamma distribution, 
then {p*, the quantile corresponding to P* is given by 


fp? = Oy,,p*. (2.7) 


Values of y,,p* are available in Tables IIIA, B, C, D, and E [10] for r=4, 2, 3, 
4,5, and P+=.01, .05, .10, .25, .50, .75, .90, .95, .99. Values of y,. p* for r= 1(1)26 
and P*=.75, .90, .95 and .99 are also given in Table IIA at the end of this 
paper. Other relevant references for tables of y,,p* are [2], [15], [18]. 

The characteristic function ¢,(u) of the gamma variable is 


o:(u) = (1 — iOu)-" (2.8) 


from which it follows that the sum of n independent gamma chance variables 
t; (¢=1, 2,- +--+, m) with the parameters @ and ¢; ({=1, 2,-+-, mn) is again a 
gamma distribution with parameters @ and 





GAMMA DISTRIBUTION IN ACCEPTANCE SAMPLING 945 
AF tet +s Ht 


In particular, the sum of n independent and identical exponential chance vari- 
ables is a gamma distribution with parameters @ and n. 

For a summary of mathematical properties of the incomplete gamma func- 
tion, see reference [7]. 


ESTIMATION OF THE PARAMETERS @ AND r 


As mentioned in the introduction, references [3], [9], [19], [10] have re- 
sults pertaining to the estimation of the parameters of the gamma distribution. 
For the sake of completeness we shall give here the maximum likelihood and 
moment estimators for the parameters r and @ of the gamma distribution. 


(a) Maximum Likelihood Estimators of 6 and r 


Suppose ¢;, fz, - - -, t, independent observations are given from g,(t, @) then 
the logarithm of the likelihood function L is 


Vit (r—1) Llog.t; (2.9) 


1 
0 int i=1 


log. L = — rn log. 6 — n log. T(r) — 


and the equations determining the maximum likelihood estimates # and 6 are 


yn tj 
= u =: 0. (2.10) 
6 e 
—ny(#) — nlog.6 + >> log. t; = 0 (2.11) 
where 
d 
dr 


From (2.10) we note that 6=i/? which provides an estimate of @ if r is known 
or has otherwise been estimated. In general, the two estimates # and 6 are to 
be determined by using iterative procedures on (2.10) and (2.11). For the case 
when r is assumed to be an integer, the function ¥(r) known as the digamma 
function reduces to the simple form 


¥y(r) = log I(r). 


Ee eS 7 Ope ae mae r22 (2.12) 
2 3 r—1l 
where y is Euler’s constant = -577215649. 

Since the computation of ¥(r) (it is an infinite series) in general is incon- 
venient, the equation giving the likelihood or log likelihood function can be 
utilized to evaluate 6 and # (approximately), provided we assume that r takes 
a finite number of values. The iterative procedure makes use of the fact that 
6=1/*? so that L becomes a function of r only. For given t;, te, - - + , tn, L can be 
evaluated over the finite set of values of r so that we can select that value of r 
which maximes L. 


(b) Moment Estimators 
° & a ° ‘ 
The moment estimates @ and r are given by the equations 





AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 




































































Ss 


























1.0 2.0 3.0 4.0 5.0 6.0 


{~-— 





Fia. I. Standard gamma density functions, 


i = #6 
y ® (t; — t)? sd ie? 


n-1 


A & 
Equations (2.13) and (2.14) give explicit solutions for 7 and 6 as 


t; — #)? 
Eu _ 
t(n — 1) 
2(n — 1) 
[ee ¢ (2.16) 
2 (ti ps t)? 
It should be noted that é/ and > (ts—D?/(n—1) are unbiased estimates of ré 
and ré*, respectively. The estimates given by the method of moments are easy to 





GAMMA DISTRIBUTION IN ACCEPTANCE SAMPLING 



















































































ites 





























—— 
(Ke) 2.0 3.0 4.0 5.0 6.0 


Xe——-— 





Fria. I. (Cont'd). Standard gamma density functions. 


compute though (in general) they are not the best possible from the efficiency 
point of view (see reference [4, p. 498]). 


3. ACCEPTANCE SAMPLING PLANS AND ASSOCIATED OPERATING 
CHARACTERISTIC FUNCTIONS 


We assume that the life time follows a gamma distribution. A common prac- 
tice in life testing is to terminate a life test by a preassigned time ¢ and note the 
number of failures (assuming that a failure is well-defined). One object of these 
experiments is to set a confidence (lower) limit on the mean life. It is then de- 
sired to establish a specified mean life with a given probability of at least P*. 
The decision to accept the specified mean life occurs if and only if the number 
of observed failures at the end of the fixed time ¢ does not exceed a given num- 
ber c. (One can terminate the testing before time ¢ is reached when the number 
of failures exceeds c; the decision in this case is to reject.) For such a truncated 








948 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


life test and the associated decision rule we are interested in obtaining sampling 
plans, i.e. we want to find the smallest sample sizes necessary to achieve our 
objective. A sampling plan consists of (i) the number of units n on test, (ii) an 
acceptance number c such that if c or fewer failures occur during the fixed time 
t, the lot is accepted, (iii) a ratio ¢/r@) where r@ is the specified mean life. The 
numbers r and 6) are known. The choice of c and ¢ (and hence of n) will in general 
be made from the producer’s risk which is the probability of rejecting a good 
lot, i.e. a lot for which the true mean life r@= specified mean life r@. The con- 
sumer’s risk is fixed in our formulation and cannot exceed 1—P*. Thus the 
probability P* is a confidence level in the sense that the chance of rejecting a 
lot having a parameter #34 is at least equal to P*. For a fixed P*, our sam- 
pling plans are characterized by the triplet (n, c, t/r@). 

It should be pointed out here that by lot we mean a lot of infinite (large) 
size so that binomial distribution theory applies and the acceptance or rejec- 
tion of the lot are equivalent to the acceptance or rejection of the hypothesis 
§=0). Mathematically, given a number P* (0< P* <1), a time ¢ and a value 4 
of @ and an acceptance number c, we want to find the smallest positive integer 
n so that if the observed number of failures in time ¢ does not exceed ¢c we can 
assert with probability P* that 624) or, equivalently, that the mean life 
r0=rOo. 

The required n is the smallest positive integer which satisfies the inequality 

se 
> ( ) pt —p)"'s1-FP* (3.1) 
i=o \t 
where p=G,/(t, 00) = probability of a failure in time ¢ if true mean life is r@). This 
follows from the fact that the chance of observing z failures during ¢ is a bino- 
mial probability (n/x)p*(1—>p)"-* and also from the theory of confidence inter- 
vals. 

Since G,(t, #0) depends only on the ratio ¢/@, the experimenter needs to specify 
only this ratio. 

If the observed number of failures is less than or equal to c, then from (3.1) 
we can make the confidence statement that G,(t, 0) <G,(t, 00) with probability 
P*. Since G,(t, @) is a monotonically increasing function of t/@ (this is easy to 
see since the derivative of G,(t, @) with respect to ¢/@ is g,(t, 8) which is greater 
than 0), it follows that 


G,(t, 0) S G,(t, 00) 38 = % (3.2) 


ré = 700. 


Minimum sample sizes satisfying the inequality (3.1) have been obtained for 
r=4, 2, 3, 4, and 5, and P*=.75, .90, .95, .99 and selected values of t/r@. These 
are given in Tables IA, B, C, D, and E. 

It should be pointed out that the above tables are also applicable to the 
problem of the minimum sample size for setting confidence limits on the param- 
eter p (=G,(t, %)) of a binomial distribution. 








GAMMA DISTRIBUTION IN ACCEPTANCE SAMPLING 949 


OPERATING CHARACTERISTICS OF THE SAMPLING PLANS 
(n, c, t/r@ >) AND PRODUCER’S RISK 


The Operating Characteristic (OC) function of the sampling plan (n, ec, 
t/r@o) gives the probability of accepting the lot. For the above sampling plan 
and any value of @, this probability is given by 


c/n 
P{ Accepting the lot} = x ( ) pil — p)**=1-—J,(c+1,n—c) (38.3) 
a 


i=0 


where I,(c+1 n—c) is the incomplete beta function and where p=G,(i, 0) 
can be computed by using formula (2.3), if r is an iateger or using (2.4) if risa 
half-integer. For fixed t, 1—G,(t, @) is a monotonically increasing function of @ 
and J,(a, b) is an increasing function of (0<2 <1); it follows that the probabil- 
ity of acceptance, i.e. the OC function is an increasing function of @. The value 
of this function approximately equals 1 — P* (less than or equal to because of the 
discrete distribution) at @=0». For 6>6o, the value of the OC function is greater 
than 1—P*. 

Formula (3.3) shows how to compute the OC function of a sampling plan. 
The OC function is important for deciding on a particular sampling plan. For 
given P* and t/r@o, the choice of ¢ and of n will be made on the basis of the OC 
function. OC curves for a few selected sampling plans are drawn in Figure 
IIIA and Figure IIIB. 

The producer’s risk is the probability of rejection of the lot when #2 4) and is 
given by 


Producer’s Risk = 1 — P{ Acceptance | @ = 6} 


> (‘) pi(l—p)t=I,(e+1,n—0) (34) 


t=c+1 t 


where the values of @ entering into p=G,(t, 0) are greater than or equal to 0p. 
Thus for any giver 620 we can compute the producer’s risk by first finding p 
and then using a binomial distribution table or a table of the incomplete beta 
function. 

For a given value of the producer’s risk=.05 (say) the following question 
can be asked: what value of 6/8 will insure the producer’s risk equal to .05 or 
less if a sampling plan under discussion is adopted? This value of @/@ is the 
smallest number (>1) for which p=G,(t, 6) satisfies the inequality 


ce 


I,_,(n —c,c+1) = >> (") p(1 — p)*-* = .95. (3.5) 
7 


t=O 
It should be noted that the quantity 0/6) occurs in the expression for p=G,(t, 8). 
For example, for r=2, 
o 


p = Galt, 0) = 1 — e~(t/40)(00/0) _ @—(+/60) (60/8) 6 (3.6) 
6 @ 


A brief table of the quantity 6/6) for some sampling plans (n, c, t/r@o) cor- 











950 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


responding to P*=.90 is given in this paper as Table V for r=2 and for the 
producer’s risk = .05. 


4. AN APPROXIMATION FOR THE MINIMUM SAMPLE SIZE 


The existing tables [11] and [14] of the binomial distribution are not 
adequate to solve for the n values. The values in Tables IA, B, C, D, and E 
were obtained on IBM 650 by using a trial and error method on n and, of 
course, using the monotonic behavior of the n values with respect to p. But even 
here difficulty was experienced in obtaining the exact values for small values of 
p. For these reasons a method of approximating the minimum sample sizes 
was developed. This method uses the fact that if p is small and n is large (as is 
true in our problem), the binomial distribution is approximated by the Pois- 
son distribution with the parameter \=np. Thus, we see that the left side of 
equation (3.1) can be written as 





c ex? 
» x 7 <s1-P* | (4.1) 
t=O . 


where \=nG,(t, 90). 
It follows from the definition of G,(t, 6) that 





c er? 
DL = 1 = Gon, 1). (4.2) 
t=0 tv: 
Thus, we have from (4.1) 
Geyi(d, 1) 2 P*. (4.3) 


Thus, the value of \=np=nG,(t, 00) is P* percentage point of a standardized 
gamma distribution with parameter r=c+1. This leads us to the following ap- 


proximation formula for n. 
Yo+i,P* 
n =| ——/|+1 4.4 
lac =| ( 


where Yci1, p* denotes the P* percentage point of a standardized gamma 
variable with parameter r=c+1 or one-half times the P* percentage point of a 
x? with (2c-+2) degrees of freedom. The symbol [x] stands for the integer less 
than or equal to x and G,(t, 9) is the e.d.f. of the gamma distribution with 
parameters r and 6. Both y.41, p* and G,(t, 0) are given in Tables ITA and IIB 
for selected values of c and P* and for selected values of r and t/r@o, respectively. 
Our values in both Tables IIA and IIB go to a higher number of decimal 
places than those available in the existing tables. Thus, with the help of these 
tables one can obtain an approximate value of the minimum sample size n. 
Since the Poisson approximation to the binomial improves as n increases and 
as p=G,(t, 0) decreases, it is clear that our approximation formula (4.4) for 
n improves as t/r@) decreases. 

The relative error in using the approximation was computed for a few selected 











GAMMA DISTRIBUTION IN ACCEPTANCE SAMPLING 951 


cases and the results, as given in Table III, indicate the accuracy of this ap- 
proximation. 


5. FAILURE RATE (FORCE OF MORTALITY, INTENSITY FUNCTION) 


The conditional failure rate at time ¢ associated with a life density f(t, 6) is 
also known as force of mortality, intensity function and hazard rate. We shall 
refer to it as failure rate. It is defined by 











tt+h 
f(x, 0)dx 
FR (Failure Rate) = Limit aes 
hf f(x)dz 
‘ (5.1) 
fO 
1 — F(t) 
For the gamma distribution with parameters r and @ we have 
e—t/%r-1 
R = 5.2 
r(r)o"[1 ™ G,(t, 6) © ' 
so that 
ez"! 
6(FR) = (5.3) 


r()[1 — G,(z, 1)] 


where «=1/8. 

For any r and z=1/@, the right-hand side can be computed to give the value 
of @ times tle failure rate. This was done for r=$, 2, 3, 4 and 5 and selected 
values of x. The results of these computations are given in Table IV. 

Formula (5.3) reduces to the following simple form when r is an integer, 


gr} 








6(FR) = , (5.4) 

r—1 Zz 

ro[ = 

ino J! 

and for r= it reduces to 
o(FR) = —= —= (5.5) 
V/ rx (1 — erf Vz) 

Since when r=1, i.e., for the exponential case, 

Gi(a, 1) = 1 — e* (5.6) 


we see from (5.3) that 


1 
FR =—- 
9 





952 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


For r<1 it is easy to show from (5.3) that the failure rate is a monotone de- 
creasing function of z=t/@. It approaches the limiting value 1/@ as z— @. 
Similarly, for r>1, the failure rate is an increasing function of c=t/@ and ap- 
proaches the limit 1/@ as z—. Figure II shows the graphs for @ times the fail- 
ure rates for r=4, 1, 2, 3, 4, and 5. It should be noted that for fixed @ and f, 
this quantity decreases as r increases. 






































6 TIMES FAILURE RATE 

















TO 08 TH 
PER THOUSAND HOURS, DIVIDE ORDINATE 

=| BY @ AND MULTIPLY BY 105 WHERE 7@ 
=] 1$ THE MEAN LIFE 


Litii iy HHH Pe eeriie att 
TT ‘ 


io! 



































Fig. II. Failure rates for the gamma distribution. 


6. DESCRIPTION OF THE TABLES AND EXAMPLES 


Table I gives the minimum sample sizes necessary to assure a meaa life. 
These tables are for r=}, 2, 3, 4, and 5 and for P* =.75, .90, .95, and .99. The 
c-values range over 0(1) 15 in all cases. The range of t/r@ varies for each r. The 
use of these tables is illustrated by the following example. 


Example 1. Choosing Sample Size Given t/r#,c and P* 


Assuming that the life distribution is a gamma with parameter r=2, the ex- 
perimenter is interested in establishing that the true unknown mean life ré 





GAMMA DISTRIBUTION IN ACCEPTANCE SAMPLING 953 


is at least equal to ré)= 10,000 hours with probability P* =.75. Suppose it is 
desired to stop the life test experiment at t= 1,000 hours. Then for an accept- 
ance number of c=2, the required number n is the entry in Table IB corre- 
sponding to the values of P* =.75, t/r#g9=.10 and c=2. This number n= 223. 
Thus 223 units have to be put on test. If during 1,000 hours, no more than 2 
failures are observed, then the experimenter can assert at a confidence level of 
.75 that the mean life is at least equal to 10,000 hours. For this sampling plan, 


1.0 


pee) 
oO 
z 
a 
KE 
a 
8 
aq 
a 
o 
> 
= 
= 
@o 
oe 
oO 
c 
x 
a 


12 16 20 Ea 28 
6/8, 
Fig. IIIA. OC curves for given sampling plans. 


we find from the appropriate OC curve in Figure IIIB that the associated pro- 
ducer’s risk is approximately .10 or less if the actual mean life r@ is twice the 
specified mean life r@>. Note that for various choices of c (and hence of n) Table 
V (for r=2) gives the value of 6/4 in order that the producer’s risk may not ex- 
ceed .05. Thus for the above example the values of 6/4) for c=0, 1, 2 and 3 are 
5.48, 2.86, 2.27 and 1.99 and the consideration of the actual mean life neces- 
sary in order to ship 95 per cent of the lots (and perhaps other cost and extra 
statistical considerations) will play the key role in deciding on which c to 
choose. 





954 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


Table II provides us an approximate answer to the minimum sample size. 
The approximate value is obtained by taking the entry for the corresponding 
set of values from Table IIA and dividing it by the appropriate entry in Table 
IIB. It should be noted that the approximate value has been found to be larger 
than the exact value in all cases studied. Note that the approximation im- 
proves as t/r@) decreases. This is also borne out by Table ITI. 





1.0 


























J 
Oo 
z 
< 
be 
a. 
uJ 
oO 
oO 
“tf 
Ww 
°o 
> 
kK 
- 
a 
be § 
a 
oO 
x 
a 









































Fia. IIIB, OC curves for given sampling plans. 


Example 2. Application of the Approximation Formula 


For the data of Example 1, we find from Table IIA corresponding to P* = .75 
and c=2, the tabulated value is 3.92040. From Table IIB corresponding to 
t/r@)=.10, r=2 the tabulated value is .0175231. Thus, the approximate mini- 
mum sample size is the largest integer in 


3.92040 


—pmsmmen ste | ae 296. 
.0175231 





GAMMA DISTRIBUTION IN ACCEPTANCE SAMPLING 955 


Table IIT gives an estimate of the relative per cent error in using the ap- 
proximation for selected values of P*, r, c, t/r@. For this comparison the ap- 
proximate n was taken to be the largest integer less than or equal to the ratio of 
the two numbers in Tables ITA and IIB, respectively, since on empirical evi- 
dence it was felt that we would be satisfying the confidence level by so modify- 
ing our approximation. 

In Table IV are given the values of @ times the failure rates as a function of 
t/@ for values of r=}, 2, 3, 4, and 5. 


Example 3. Finding the Failure Rate 


It is known that the lifetime follows a gamma distribution with parameter 
r=2 and 6=10,000 hours. It is required to find the failure rate at time t= 4,000 
hours. Since t/@=.40, we find from Table IV, that 


6(FR) = .28571 % 


so that this failure rate expressed in per cent per thousand hours (a prevalent 
method among engineers for expressing the failure rate and not to be confused 
with an average failure rate over one thousand hours) is 2.8571 per cent per 
thousand hours. 

For given sampling plans Table V gives the ratio of the true mean life to the 
specified mean life in order that a lot of better quality be accepted on the aver- 
age 19 out of 20 times (producer’s risk = .05). Equivalently, this table gives the 
factors by which the mean life of the manufacturer’s product should be better 
than that specified so that the specified mean life can be established with prob- 
ability P* in 19 out of 20 cases on the average. 


Example 4. The Ratio of the Actual Mean Life to the Specified Mean Life for 
Given Producer’s Risk and Given Sampling Plans 


The life of a certain electronic device follows a gamma distribution with 
parameter r=2. The consumer wants a probability P* =.90 of not accepting a 
quality worse than the specified. The sampling plan is to be based on an ac- 
ceptance number c=2 and on t/ré)=.20. What should be the quality of the 
producer’s product so that his risk will be .05, i.e., this product will be accepted 
19 out of 20 cases on the average? 

From Table V, we find that the entry for r=2, P* =.90, c=2 and t/r@)=.20 
is 2.74. Thus, the manufacturer’s product should have a mean life at least 2.74 
times the specified mean life in order that under the above acceptance sampling 
plan the product be accepted with probability .95. Note that for this case n=85 
so that the sampling plan is (85, 2, .20). 


7. ACKNOWLEDGMENTS 


The authors are thankful to Messrs. J. A. Tischendorf and D. 8. Peck for 
constructive conversations on some topics in this paper. The authors also wish 
to thank a referee for some helpful comments. 


(References will be found on page 970) 





AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 
TABLE IA 


r=4 
The entries in these tables are the minimum sample sizes to be tested for a time ¢ in 
order to assert with probability P* that @>6) or equivalently, the average life r@>r0o. 
The parameters @ and r, where r is assumed to be known, characterize. the underlying 
gamma density function for life, viz., 


pf 


t 
r(t, 0) = e~#/8 — . 
gr(t,@) =e eT) 


For the above confidence statement the number of failure in time ¢ should be <c. 








P* =.75 


t/r@o 
1.0 75 





° 





0 
1 
2 
3 
4 
5 
6 
7 
8 
9 











a 





comnoartwWn re © 














GAMMA DISTRIBUTION IN ACCEPTANCE SAMPLING 


TABLE IA—(Continued) 








P* =.95 



































958 


r=2 


TABLE IB 





AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


The entries in these tables are the minimum sample sizes to be tested for a time ¢ in 
order to assert with probability P* that @>6» or equivalently, the average life r@ >r@. The 
parameters @ and r, where r is assumed to be known, characterize the underlying gamma 


density function for life, viz., 


gr(t, 0) = e~*/0 


r—1 





eT (r) : 


For the above confidence statement the number of failure in time ¢ should be <c. 
































P* =.75 
— 
t/r60 | 
1.0 75 (BO 20 .10 075 05 02 01 
c 
0 2 3 5 22 79 136 296 1,779 7,021 
1 4 6 10 43 153 264 575 3,456 13,634 
2 6 8 14 63 223 384 837 5,032 19,848 
3 8 11 19 82 291 501 1,091 6,557 25,865 
4 10 13 23 101 357 615 1,340 8,053 31,760 
5 12 16 27 120 423 728 1,586 9,526 37,569 
6 13 18 32 138 488 839 1,828 10,984 43,314 
7 15 21 36 156 552 950 2,069 12,429 49,009 
8 17 23 40 175 616 1,060 2,308 13,863 54,663 
9 19 26 44 193 679 1,169 2,545 15,289 60,283 
10 21 28 48 210 742 1,277 2,782 16 ,708 65,874 
11 22 31 52 228 805 1,385 3,017 18,121 71,440 
12 24 33 56 246 867 1,493 3,251 19,528 76,985 
13 26 36 60 264 930 1,600 3,485 20,931 82, 509 
14 28 38 65 281 992 1,707 3,718 22,320 88,017 
15 || 30 40 69 299 1,054 1,814 3,950 23,723 93 ,509 
P* =.90 
\ 
t/r00 
1.0 75 (sO .20 10 075 05 02 01 
e'\ 
0 3 4 8 37 131 225 491 2,955 11,661 
1 5 5 14 62 221 381 830 4,992 19,696 
2 . ll 19 85 303 521 1,136 6,830 26,948 
3 10 14 24 107 380 655 1,427 8,574 33,824 
4 12 16 29 128 455 783 1,707 10,259 40 ,467 
5 14 19 33 149 528 909 1,981 11,903 46 , 950 
6 16 22 38 169 599 1,032 2,249 13,516 53,312 
7 18 24 43 189 670 1,154 2,514 15,106 59,580 
8 19 27 47 209 740 1,274 2,775 16 ,676 65,771 
9 21 30 51 229 809 1,393 3,034 18,231 71,899 
10 23 32 56 248 877 1,510 3,291 19,772 77,972 
il 2 8685 —=—0—t—é 945 1,627 3,545 21,301 83,999 
12 27 37 65s 1,012 1,743 3,798 22,819 89,984 
13 29 40 «69 305 1,079 1,859 4,049 24,329 95,934 
14 31 43 73 324 1,146 1,974 4,299 25,830 101,851 
15 33 4 «78848 1,212 2,088 4,548 27,324 107 ,739 














GAMMA DISTRIBUTION IN ACCEPTANCE SAMPLING 959 


TABLE IB—(Continued) 


























P* =.95 
t/r6o 
1.0 75 50 -20 10 075 .05 02 1 

¢e 

0 4 6 10 48 170 293 639 3,844 15,171 

1 6 9 16 76 269 464 1,013 6,088 24 ,022 
2 9 12 22 i01 358 616 1,344 8,079 31,878 
3 ll 15 27 124 441 759 1,655 9,950 39 ,257 
4 13 18 32 147 520 897 1,954 11,747 46 ,542 
5 15 21 37 169 598 1,030 2,245 13 ,492 53 ,222 
6 17 24 42 190 673 1,160 2,529 15,198 59 ,949 
7 19 27 47 211 748 1,288 2,808 16 ,873 66 , 557 
8 21 30 52 232 821 1,414 3,082 18,524 73 ,066 
9 23 32 56 252 893 1,539 3,354 20,155 79,495 
10 25 35 61 273 965 1,662 3,622 21,768 85 ,854 
ll 27 38 66 293 1,036 1,784 3,888 23 ,366 92,154 
12 29 40 70 313 1,106 1,906 4,152 24,951 98 ,402 
13 31 43 75 332 1,176 2,026 4,414 26,524 104,603 
14 33 46 7 352 1,246 2,145 4,674 28 ,087 110,764 
15 35 48 Sd 372 1,315 2 ,264 4,933 29 ,640 116 ,887 

Pt =.99 
| 
\ t/r00 | 
1.0 -75 50 -20 10 075 .05 02 01 

¢e 

0 6 8 16 73 261 450 982 5,909 23 ,321 

1 8 12 23 105 377 649 1,416 8,519 33 ,615 

2 ll 16 29 134 477 823 1,794 10,787 42,564 

3 13 19 35 160 570 983 2,144 12,891 50 ,862 

4 16 23 40 185 659 1,136 2,477 14,892 58,756 

5 18 26 46 209 744 1,283 2,798 16 ,822 66 ,367 

6 20 29 51 233 828 1,427 3,110 18 ,698 73,768 

7 22 32 56 256 909 1,567 3,416 20 ,533 81,001 

8 24 35 61 278 989 1,704 3,715 22 ,333 88,100 

9 26 38 66 301 1,067 1,840 4,010 24,104 95 ,086 
10 28 40 71 323 1,145 1,973 4,301 25,851 101 ,976 
ll 30 43 76 344 1,222 2,105 4,588 27 ,578 108 ,783 
12 33 46 81 366 1,297 2 ,236 4,873 29 ,286 115,517 
13 35 49 86 387 1,372 2,365 5,154 30,977 122,187 
14 37 52 91 408 1,447 2,493 5,433 32,655 128 ,800 
15 39 o4 95 429 1,521 2,620 5,710 34,319 135,361 











960 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


TABLE IC 
r=3 
The entries in these tables are the minimum sample sizes to be tested for a time ¢ 
in order to assert with probability P* that @>6. or equivalently, the average life r@ >r@p. 
The parameters @ and r, where r is assumed to be known, characterize the underlying 
gamma density function for life, viz., 


=. 





t 
, t 6 as etd — . 
or(t, 0) me eT (r) 


For the above confidence statement the number of failure in time ¢ should be <c. 









































P* =.75 
t/r0o 
i; 10 75 50 .20 10 075 05 

oN 
0 2 3 7 60 385 863 2,756 
1 4 6 14 116 748 1,677 5,353 
2 6 9 20 169 1,089 2,442 7,793 
3 ~ 12 26 220 1,419 3,182 10,157 
4 10 15 32 271 1,742 3,908 12,472 
5 12 18 38 320 2,061 4,623 14,754 
6 14 21 44 369 2,377 5,330 17,012 
7 16 24 50 418 2,690 6,032 19,249 
8 18 27 56 466 3,000 6,728 21,471 
9 19 29 61 514 3,309 7,420 23 ,679 
10 21 32 67 562 3,616 8,109 25,877 
11 23 35 73 610 3,922 8,795 28,064 
12 25 38 78 657 4,226 9,478 30,243 
13 27 40 84 704 4,530 10,159 32,415 
14 29 43 90 752 4,833 10,837 34,580 
15 31 46 95 798 5,134 11,514 36,739 

P* =.90 
rT 
~~ t/r0o 1] 
~ | 1.0 75 50 20 10 075 05 

c Palas | 
0 3 5 11 99 639 1,434 4,577 
1 6 9 19 167 1,080 2,422 7,732 
2 8 12 27 229 1,477 3,314 10 ,580 
3 10 16 34 288 1,855 4,160 13,280 
4 12 19 40 344 2,219 4,978 15,890 
5 14 22 47 400 2,575 5,776 18,436 
6 16 25 53 454 2,924 6,559 20 ,935 
7 18 28 60 507 3,268 7,331 23,397 
8 20 31 66 560 3,608 8,093 25,829 
9 |} 22 34 2 $12 3,944 8,847 28 ,237 
10 | 24 37 78 664 4,278 9,595 30,623 
11 | 26 40 84 716 4,609 10,337 32,991 
2 28 43 91 767 4,937 11,074 35,342 
13 30 46 97 818 5,264 11,807 37 ,680 
14 32 49 103 868 5,589 12,536 40 ,005 
15 34 52 109 918 5,912 13,261 42,319 








GAMMA DISTRIBUTION IN ACCEPTANCE SAMPLING 


TABLE IC—(Continued) 


’ 





961 





























P* =.95 
1.0 75 50 .20 .10 075 05 
ce 
0 4 7 15 129 831 1,865 5,955 
1 7 11 23 204 1,317 2,954 9,430 
2 9 14 31 271 1,747 3,920 12,515 
3 11 18 39 334 2,152 4,828 15,413 
4 12 21 46 394 2,541 5,700 18,196 
5 16 24 53 453 2,918 6,547 20,898 
6 18 28 59 510 3,288 7,375 23,540 
7 20 31 66 566 3,650 8,188 26,135 
8 22 34 73 622 4,007 8,989 28 ,692 
9 24 37 79 677 4,360 9,780 31,217 
10 26 40 86 731 4,709 10,563 33,716 
ll 28 43 92 785 5,055 11,339 36,190 
12 30 46 98 838 5,398 12,108 38 ,645 
13 32 49 105 891 5,739 12,872 41,081 
14 34 52 111 943 6,077 13,630 43 , 502 
15 36 55 115 996 6,413 14,384 45,908 
P* =.99 
t/r0o 
1.0 75 .50 .20 .10 075 .05 
¢e 
0 6 10 22 197 1,278 2,867 9,154 
1 9 14 32 285 1,842 4,133 13,196 
2 11 19 41 361 2,333 5,234 16,710 
3 14 22 49 432 2,788 6,255 19,968 
4 16 26 57 499 3,221 7,226 23 ,068 
5 19 30 65 564 3,638 8,162 26 ,057 
6 21 33 72 627 4,044 9,073 28 ,963 
7 23 37 80 688 4,441 9,963 31,804 
~ 25 40 87 749 4,830 10 ,837 34,592 
9 27 43 94 808 5,214 11,696 37 ,336 
10 i 29 46 101 867 5,592 12,544 40 ,042 
11 . ae 50 107 925 5,965 13 ,382 42,716 
12 | 34 53 114 982 6,335 14,211 45 ,361 
13 36 56 121 1,039 6,701 15,032 47 ,981 
14 38 59 128 1,096 7,064 15,846 50,579 
15 62 134 1,152 7,424 16,654 











40 





AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 
TABLE ID TABLE IE 


r=4 r=5 


The entries in these tables are the minimum sample sizes to be tested for a time ¢ in 
order to assert with probability P* that @>6) or equivalently, the average life ré>r6o. The 
parameters @ and r, where r is assumed to be known, characterize the underlying gamma 
density function for life, viz., 


vs 


er (r) 


gr(t, 6) = e~ #8 


For the above confidence statement the number of failure in time ¢ should be <c. 











P*=.75 





ne 
‘i 








47,358 
“68,948 
89, 855 
110,338 
130,525 
150,492 
170,285 
189,937 
209, 472 
228,908 
248,257 
267,532 
286,740 


@eanreourre ONS © 


107 ,324 




















a 





40,501 

68,414 

93,607 
117,495 
140,579 
163, 104 
185,212 
206, 993 
228, 508 
249, 803 
270,910 
291, 855 
312,659 
333,338 
353,904 
374,371 


@enocnwsAa wwe eK Oo 




















GAMMA DISTRIBUTION IN ACCEPTANCE SAMPLING 
TABLES ID, IE—(Continued) 





P*=.95 





10 


oa 
o 








17,401 
27,554 
36,567 
45,033 
53, 162 
61,056 
68,776 
76 ,357 
83,827 
9° ,204 
98,503 
105,733 
112,903 
120,021 
127,091 
134,119 


eoonoausk ONY © 
ocomnourk. ower © 




















05 


° 
a 





26,749 
38,558 
48 , 823 
58,343 
67,399 
76, 133 
84, 623 


81,002 
116,760 
147,846 
176,673 
204,097 
230, 543 
256, 253 
281,387 92,923 
306,051 101,068 
330,324 109,084 
354, 265 116,990 
377,917 124,801 
401,318 ‘ 132,529 
424,496 140, 183 
447,474 147,772 
470,273 155,301 


ononue wwe © 


0 
1 
2 
3 
4 
5 
6 
7 
8 




















AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


TABLE II 


To obtain an approximate value to the minimum sample size of Table I, divide the 
entry in Table IIA for the given c and P* by the entry in Table IIB for the given r and 
t/ro. 


TABLE IIA 


The entries in this table are the P*-percentage points of a gamma distribution with 
parameter r=c+1 or, equivalently, one-half times the P*-percentage points of a x? dis- 
tribution with 2c+2 degrees of freedom. 








90 95 


| 
1.38629 2.30259 .99573 -60517 
.69263 3.88972 . 74386 -63835 
3.92040 5.32232 . 29579 -40595 
. 10943 6.68078 . 75366 .04512 
3.27443 7.99359 . 15352 .60463 
.42270 9.27467 .51304 . 10848 
.55847 10 .53207 .84240 t. 57062 
9.68443 11.77091 - 14811 . 99996 
.80245 12.99471 .43465 .40265 
-91385 14.20599 . 70522 -78312 
3.01963 15 .40664 - 96222 14468 
. 12058 .59812 . 20752 .48991 
.21728 17 .78159 .44257 .82084 
.31025 18 .95796 .66857 . 13912 
.39987 20.12801 .88649 . 44609 
.48649 21 .29237 .09713 . 74289 
.57039 22 .45158 24 .30118 .03045 
.65181 23 .60609 .49923 30961 
21 .73095 . 75629 .69177 .58104 
2.80801 25 .90253 .87924 31 .84537 
23 .88313 .04510 .06202 33 .10312 
24 .95643 28 . 18427 30. 24044 .35476 
5.02810 29 .32027 -41481 35 .60070 
.09818 30 .45330 .58538 .84132 
. 16680 31 .58356 3.75240 38 .07695 
. 23405 32.71121 .91608 9.30788 





























GAMMA DISTRIBUTION IN ACCEPTANCE SAMPLING 
TABLE IIB 


The entries in this table are the values of the cumulative distribution function 
G,(t, 90) of a gamma or x? distribution. 





965 








6.82690 (—1) 
6.13524 (—1) 
5.20500 (—1) 
3.45279 (—1) 
2.48170 (—1) 
2.15809 (—i) 
1.76937 (—1) 
1.12463 (—1) 
7.96558 (—2) 
6.90127 (—2) 
5.63722 (—2) 
3.56707 (—2) 
2.52273 (—2) 
2.18485 (—2) 
1.78399 (—2) 
1.12835 (—2) 
7.97870 (—3) 
6.91000 (—3) 
5.64200 (—3) 
3.56810 (—3) 
2.52310 (—3) 


6.32121 (—1) 
5.27633 (—1) 
3.93469 (—1) 
1.81269 (—1) 
9.51626 (—2) 
7.22565 (—2) 
4.87706 (—2) 
1.98013 (—2) 
9.95027 (—3) 
7.47194 (—3) 
4.98752 (—3) 
1.99800 (—3) 
9.99500 (—4) 
7.49720 (—4) 
4.99875 (—4) 
1.99980 (—4) 
9.99950 (—5) 
7.49972 (—5) 
4.99988 (—5) 
1.99998 (—5) 
9.99995 (—6) 


5.93994 (—1) 
4.42175 (—1) 
2.64241 (—1) 
6.15519 (—2) 
1.75231 (—2) 
1.01858 (—2) 
4.67884 (—3) 
7.78983 (—4) 
1.97353 (—4) 
1.11381 (—4) 
4.96679 (—5) 
7.79870 (—6) 
1.99734 (—6) 
1.12388 (—6) 
4.99667 (—7) 
7.99787 (—8) 
1.99973 (—8) 
1.12489 (—8) 
4.99967 (—9) 
7.99980 (—10) 
1.99997 (—10) 


5.76810 (—1) 
3.90661 (—1) 
1.91153 (—1) 
2.31153 (—2) 
3.59949 (—3) 
1.60519 (—3) 
5.02862 (—4) 
3.44182 (—5) 
4.39995 (—6) 
1. 86669 (—6) 
5.56210 (—7) 
3.58384 (—8) 
4.48989 (—9) 
1.89524 (—9) 
5.61867 (—10) 
3.59838 (—11) 
4.49899 (—12) 
1.89812 (—12) 
5.62437 (—13) 
3.59984 (—14) 
4.49990 (—15) 


5.66530 (—1) 
3.52768 (—1) 
1.42877 (—1) 
9.07986 (—3) 
7.76251 (—4) 
2.65811 (—4) 
5.68402 (—5) 
1.60100 (—6) 
1.03310 (—7) 
3.29500 (—8) 
6.56088 (—9) 
1.69578 (—10) 
1.06326 (—11) 
3.36691 (—12) 
6.65601 (—13) 
1.70558 (—14) 
1.06633 (—15) 
3.37419 (—16) 
6.66560 (—17) 
1.70656 (—18) 
1.06663 (—19) 


5.59507 (—1) 
3.22452 (—1) 
1.08822 (—1) 
3.65985 (—3) 
1.72116 (—4) 
4.52769 (—5) 
6.61171 (—6) 
7.66780 (—8) 
2.49795 (— 9) 
5.98976 (—10) 
7.97028 (—11) 
8.26419 (—13) 
2.59334 (—14) 
6.16053 (— 15) 
8.12108 (—16) 
8.32639 (—18) 
2.60308 (—19) 
6.17788 (—26) 
8. 13633 (—21) 
8.33264 (—23) 
2.60406 (—24) 





10 





-0075 
-0050 
-0020 
-0010 
-00075 
-00050 
-00020 
-00010 
000075 
-000050 
-000020 
-000010 











5.54320 (—1) 
2.97070 (—1) 
8.39180 (—2) 
1.50023 (—3) 
3.88561 (—5) 
7.85450 (—6) 
7.83472 (—7) 
3.74225 (—9) 
6. 15534 (—11) 
1. 10968 (—11) 
9. 86803 (—13) 
4.10477 (—15) 
6. 44676 (—17) 
1.14886 (—17) 
1.00990 (—18) 
4.14294 (—21) 
6.47667 (—23) 
1.15278 (—23) 
1.01220 (—24) 
4.14670 (—27) 
6.47961 (—29) 





5.50289 (—1) 
2.78209 (—1) 
6.52881 (—2) 
6.22315 (—4) 
8.88362 (—6) 
1.38015 (—6) 
9.40529 (—8) 
1.85062 (—10) 
1.56980 (—12) 
2.08325 (—13) 
1.23808 (—14) 
2.06607 (—17) 
1.62404 (—19) 
2.17115 (—20) 
1.27267 (—21) 
2.08898 (—24) 
1.63301 (—26) 
2. 18000 (—27) 
1.27613 (—28) 
2.09124 (—31) 
1.63390 (—33) 





5.47039 (—1) 
2.56020 (—1) 
5.11336 (—2) 
2.60440 (—4) 
2.05019 (—6) 
2.44828 (—7) 
1.13997 (—8) 
9.24120 (—12) 
3.87552 (—14) 
3.94943 (—15) 
1.56863 (—16) 
1.05018 (—19) 
4.13153 (—22) 
4.14356 (—23) 
1.61963 (—24) 
1.06371 (—27) 
4. 15806 (—30) 
4.16322 (—31) 
1.62475 (—32) 
1.06505 (—35) 
4. 16068 (—38) 





5.44347 (—1) 
2.38945 (—1) 
4.02573 (—2) 
1.09745 (—4) 
4.76584 (—7) 
4.37493 (—8) 
1.39196 (—9) 
4.64935 (—13) 
9.84592 (—16) 
7.54390 (—17) 
2.00246 (—18) 
5.37842 (—22) 
1.05901 (—24) 
7.96769 (—26) 
2.07678 (—27) 
5.45741 (—31) 
1.06676 (—33) 
8.01083 (—35) 
2.08427 (—36) 
5.46527 (—40) 
1.06753 (—42) 





5.42070 (—1) 

2.23592 (—1) 

3.18281 (—2) 

4.64981 (—5) 

1.11425 (—7) 

7.86336 (—9) 

1.70967 (—10) 
2.35307 (—14) 
2.51635 (—17) 
1. 44960 (—18) 
2.57158 (—20) 
2.77103 (—24) 
2.73079 (—27) 
1.54130 \ —28) 
2.67894 (—30) 
2.81674 (—34) 
2.75298 (—37) 
1.55069 (—38) 
2.68980 (—40) 
2.82130 (—44) 
2.75546 (—47) 





The numbers in parentheses indicate the power of 10 by which the tabulated values are to be multiplied. 





AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


TABLE III 


For selected values of c, r, P* and t/r@o, the entries in this table give the relative 
per cent error of the approximation as given by Table II to the exact value of minimum 
sample size as obtained from Table I. 








t/r00 1.0 













































































GAMMA DISTRIBUTION IN ACCEPTANCE SAMPLING 


TABLE IV 


The entries in this table are @ times the failure rates expressed in terms of ¢/@ for the 
gamma distribution with parameters r and 6, r@ being the mean life. 


967 








t/é 


r 


1/2 


9 
~ 


3 


4 





-0001 
-0002 
.0004 
-0006 
-0008 
.0010 
-0020 
.0040 
.0060 
.0080 
-0100 
.0200 
-0400 
.0600 
-0800 
- 1000 
- 2000 
-4000 
.6000 
.8000 
-0000 
.0000 
.0000 
-0000 
.0000 
.0000 
.0000 
.0000 
.0000 
.0000 
.0000 








fee eee et tt TOON ON OO I 





.0499990 
-0°19996 
-0939984 
-0959964 
-0°79936 
-0°99900 
.0719960 
.0739841 
.0759642 
.0°79365 
.0°99010 
.0'19608 
.0'38462 
.0'56604 
-0'74074 
-0'90909 
. 16667 
. 28571 
.37500 
.44444 
-50000 
. 66667 
-80000 
.85714 
. 88889 
.90909 
.95238 
.97561 
.98361 
.98765 
.99010 





.0°17989 
.0°31974 
-0°49950 
.0519960 
.0579681 
.0*17892 
.0'31745 
.0*49502 
.0°19604 
.0°76864 
.0716952 
0729542 
.0?45249 
-0'16393 
.0'54054 
.10112 
. 15094 
- 20000 
. 40000 
.61538 
- 72000 
- 78049 
-81967 
.90498 
-95125 
.96722 
97531 
-98020 








.0°16501 
-0°13069 
-0410248 
.033904 
.0°78773 
-0°15081 
.0710917 
.0°71556 
.0'19824 
.0'38694 
.0'62500 
.21053 
.45070 
.59016 
-67546 
-73206 
.85782 
.92692 
-95085 
.96297 
.97030 





.0°10248 
.0°50855 
.0515755 
.0537702 
0454582 
.0°71505 
0729648 
.0°76794 
.0'15385 
.0'95238 
.31068 
-46957 
.57464 
.64666 
.81093 
.90262 
.93448 
.95064 
.96041 








968 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


TABLE V 


For given sampling plans (n, c, t/r@o), the entries in this table give the minimum 
values of @/@ in order that the lot be accepted with the producer’s risk of .05. 

















RSISSNSE 
SBESESRLSESLE 
S222 onvahy 
oan ae oa 


“ee ee 8 to to tS O&O 
ll ell eel el 


8 | SFSSSAISRSSU 
ot ne bt oe oe bs ot 00 8 OD Ge 


eon 
S2RES 
Ot 


ee Ot tt Ot 
ee ee he et ee Ot Oo 
a 
3s 


_ 
_ 
- 
a 

















3 
" 








on 


bo co m OO | 
ae ee ee ee) 
wnws 
wwnwe 
BD OD DD OO 


~~ =e ee DD 


2.1 
9 
2. 
1. 
1.¢ 
1. 
1 


ee 





GAMMA DISTRIBUTION IN ACCEPTANCE SAMPLING 


TABLE V—(Continued) 











~ 
on 
te 
~ 
L) 


ow» 
I 
=) 


cael 
tw 


o 
prwnwns 


won ae 

S386 
wwmor © 

o 

i.) 


i 
nw 


_ 
= 


Oo = to 
ow 


bo bt tb te 
NONNNN YN WO 


@ 
rss 


2 to 


x 
oe 


a 


1 


ll ll ll ol ol ee | 


—_— = = 8 to to tS OO Uw OO 
fez) 
ca 


a 
eee ee wD 


wa 


i a 

4 

re 
| 
| 
| 
| 





z 





on 
~ 
= 
or © 


noe, 


on oe 
a 
oo 
ow oo rf © 


72N NWN 


Nwonww hw SK © 
w 
- 


www 
NwWwN WO wo to 
to ww Ww 


i) 
~ 
w 


we w te 
iw] 
— = bo bo to to to 


—_ i 


2.3 

2. 2 

2. 2 
03 1.¢ 1 

1.9 1 

1. 1 


— = to 
co 
Se 


to 
wn 
© 

5S 








AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


REFERENCES 


Birnbaum, Z. W. and Saunders, 8. C., “A statistical model for life lengths of ma- 
terials,” Journal of the American Statistical Association, 53 (1958), 153-60. 
Campbell, G. A., “Probability curves showing Poisson’s exponential summation,” 
Bell System Technical Journal, 2 (1923), 95-113. 

Chapman, D. G., “Estimating the parameters of a truncated gamma distribution, ” 
Annals of Mathematical Statistics, 27 (1956), 498-506. 

Cramér, Harald, Mathematical Methods of Statistics. Princeton: Princeton University 
Press, 1946. 

Drenick, R. F., “Mathematical aspects of the reliability problem,” Journal of the 
Society for Industrial and Applied Mathematics, 8 (1960), 125-49. 

Epstein, Benjamin, “Truncated life tests in the exponential case,” Annals of Mathe- 
matical Statistics, 25 (1954), 555-64. 

Erdelyi, A., et al., Higher Transcendental Functions, Vol. 2, New York: McGraw-Hill 
Book Co., Inc., 1953. Pp. 133-52. 

Goode, Henry P. and Kao, John H. K., “Sampling plans based on the Weibull dis- 
tribution,” Proceedings of Seventh National Symposium on Reliability and Quality 
Control. Philadelphia, Pennsylvania, January 1961, 24-40. 

Greenwood, J. A. and Durand, D., “Aids for fitting the gamma distribution by maxi- 
mum likelihood,” Technometrics, 2 (1960), 55-65. 

Gupta, Shanti S., “Order statistics from the gamma distribution, 
2 (1960), 243-62. 

Harvard Computation Laboratory, Tables of the Cumulative Binomial Probability 
Distribution, Cambridge, Massachusetts: Harvard University Press, 1955. 

Herd, G. R., “Some statistical concepts and techniques for reliability analyses and 
prediction,” Proceedings from Fifth National Symposium on Reliability and Quality 
Control, Philadelphia, Pennsylvania, January 1959, 126-36. 

Molina, E. C., Poisson’s Exponential Binomial Limit. Princeton, New Jersey: 
P. Van Nostrand Company, Inc., 1942. 

National Bureau of Standards, Tables of the Binomial Probability Distribution. 
Washington, D. C.: United States Government Printing Office, 1949. 

Pearson, E. 8. and Hartley, H. O., Biometrika Tables for Statisticians, Vol. I, Cam- 
bridge, England, University Press, 1954. 

Pearson, K., Tables of the Incomplete Y-Function. Cambridge, England: University 
Press, 1957. 

Sobel, M. and Tischendorf, J. A., “Acceptance sampling with new life test objec- 
tives,” Proceedings of Fifth National Symposium on Reliability and Quality Control, 
Philadelphia, Pennsylvania, January 1959, 108-18. 

Wilk, M. B., Gnanadesikan, R., and Huyett, M. J., “Probability plots for the gamma 
distribution,” unpublished Bell Laboratories Memorandum, 1961. (To appear in 
Technometrics.) 

Wilk, M. B., Gnanadesikan, R., and Huyett, M. J., “Estimation of the parameters 
of the gamma distribution using order statistics,” unpublished manuscript. 


” 


Technometrics, 





A BIVARIATE EXTENSION OF THE 
EXPONENTIAL DISTRIBUTION* 


JouHN E. FrREUND 
Arizona State University 


A bivariate extension of the exponential distribution is proposed as a 
model for certain problems in life testing. It applies, in particular, to 
two-component systems, which can function even if one of the com- 
ponents has failed. Various statistical properties of the model are in- 
vestigated, including maximum likelihood estimates of the parameters 
and their distributions. 


1, INTRODUCTION 


N A recent paper, E. J. Gumbel [1] proposed several bivariate distributions 
l whose marginal distributions are exponential; he did not discuss the appro- 
priateness of these models to particular physical situations. The purpose of this 
paper is to present a different bivariate extension of the exponential distribu- 
tion, which is designed, in particular, for the life testing of two-component 
systems, which can function even after one of the components has failed. It 
might, thus, apply to the study of engine failures in two-engine planes, to the 
wear of two pens on an executive’s desk, or to the performance of a person’s 
eyes, ears, kidneys, or other paired organs. 

To introduce the model, suppose that X and Y are random variables repre- 
senting the lifetimes of two components A and B in a two-component system, 
that X* represents the lifetime of comporient A if component B is replaced with 
a component of the same kind each time it fails (if necessary more than once), 
and that Y* represents the lifetime of component B if component A is replaced 
with a component of the same kind each time it fails (if necessary more than 
once). So far as X* and Y* are concerned, we shall assume that they are inde- 
pendent random variables having the exponential distributions 


f(a*) = ae-**" for z* >0 (1.1) 
f(y*) =Be*” = for y* > 0 (1.2) 


with a>0 and 8>0. To simplify the notation, the parameters used here and 
later are the reciprocals of the means of the corresponding exponential distribu- 
tions. 

An immediate consequence of these assumptions is that the element of prob- 
ability that the first failure of an A component occurs at x* and that the B com- 
ponent has not yet failed is 


(ae~**"dx*) (f aed") = ge?" (at8)qz*, (1.3) 


Similarly, the element of probability that the first failure of a B component 
occurs at y* and that the A component has not yet failed is 


Be~¥* (e+8)dy*, (1.4) 








* This research was sponsored in part by the Air Research Develop t d, USAF, under contract 
AF 33(616)-6857. 


971 





972 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


Note that if (1.3) is integrated on z* from 0 to © the result is a/(a+ 8), the 
probability that the first A-failure occurs before the first B-failure, and that if 
(1.4) is integrated on y* from 0 to ~ the result is 8/(a+), the probability that 
the first B-failure occurs before the first A-failure. 

Considering now the case where the components are not replaced, the element 
of probability that component A fails at z and that B has not yet failed is 


ae~* (a8) dz (1.5) 
analogous to (1.3), and the element of probability that component B fails at y 
and that A has not yet failed is 

Be-vat8)dy (1.6) 


analogous to (1.4). 
In order to complete the bivariate model, let us now specify, furthermore, 
that the probability density of X given that component B fails first at y is 


ago ew) for0O<y<z (1.7) 
with a’ >0, and that the probability density of Y given that component A fails 
first at x is 

B’ eB (u-=) forrO<a2<y (1.8) 
with 8’>0. It follows from these assumptions that the joint density of X and Y 
is 
ap’eb'v-(ats-8")2 for0<a<y 


) | ' (1.9) 
Ba! e-2'2—(at8—a')y forO<y<z 


feo { 


For 0<2<y the joint density is obtained by multiplying (1.5) by (1.8) omitting 
the differential, and for 0<y <z it is, similarly, obtained by multiplying (1.6) by 
(1.7) omitting the differential. 

The random variables X* and Y* will be of no further concern in this paper; 
they were introduced to explain the basic assumptions underlying the model for 
X and Y, primarily to justify (1.5) and (1.6) and to explain the significance of 
the parameters a and £. It should be noted that in the bivariate model which we 
have introduced the dependence between X and Y is essentially such that the fail- 
ure of the B component changes the parameter of the exponential life distribution 
of the A component from a to a’, while the failure of the A component changes the 
parameter of the exponential life distribution of the B component from 8B to 8’. 


2. STATISTICAL PROPERTIES OF THE PROPOSED MODEL 


To study various properties of the bivariate distribution (1.9), let us first 
derive its moment generating function 


m(s, t) = ffem4e, y)dxdy. (2.1) 


Substituting for f(z, y) according to (1.9) and integrating over the appropriate 
regions, it can be shown that 





A BIVARIATE EXPONENTIAL DISTRIBUTION 


ae ay << 7 + 
tz ee ae 
a+ a’ p’ 





which can also be written 


s+t s+t\? 
ms, ) = (a+ 8)41 + —+(—) +-:: 
a+fB a+ 8B 


x folie StS) to] 
reftata(tV af. eo 


Selecting the appropriate coefficients of this power series in s and ¢, it follows 
that 
as a’ +8 Lar B’ +a 
E(X) = — . B(Y) = ——_— 
a’(a + B) B’(a + B) 
a’? + 2aB + B? 6B’? + 2a8 + a? 


- var (Y) = a - (2.4) 
B’?(a + B)? 


var (X) = é ; 
(X) aa + By? 

; ‘8’ = 8 
cov (X, Y) = ate Nee = “ 
a’B’(a + 8)? 


Unlike the models proposed by Gumbel [1], the marginal distributions of 
(1.9) are in general not exponential. Integrating (1.9), in turn, with respect to 
y and z, it can be seen that 


(a a\at Ber alae’ 
a+ B— ad’ a+B-—a’ 





f(x) = (2.5) 


provided a+8—a’#0 and 
B’) (a + Bethy aB’eP'u 


— a of (2.6) 
a+Bp-8 a+ B— p’ 





provided a+6—8' #0. If a+8—a’=0, then 
f(z) = (a + a’Br)e-*"" (2.7) 


and if a+8—,’=0 


fy) = B+ Braye?” (2.8) 
In the special cases where a=a’ and 8=8’, X and Y are independent and 
f(z) = ae“ and f(y) = Be. 


The regression equations of Y on X and X on Y are generally not linear. Ob- 








974 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


taining the conditional density fly| x) by dividing the bivariate density by the 
marginal density of X, it can be shown that 





eat D-seeie — aipye's 
f(z)E(Y¥ | z) = —— . me 
atp—a @+p—al 

e~ (at+8)z 





enmnsiomnniain ae am ¢ ’RRQ/ 9 
Fatenapeete a’)? — app] (2.9) 
provided a+ 8—a’#0. A similar expression can be obtained for the case where 
a+ 8—a’=0 and for f(yE(X| y). 

The following are some cases which are of special interest: if a’, that is, 
if A has not failed prior to B it must fail simultaneously with B, the regression 
of Y on X is linear, namely, 


a 











E(Y | z) et gee (2.10) 
and the correlation coefficient becomes 
a’ 
° Va? + 2ap + A _— 
Similarly, if B’ >, then 
E(X|») =y+—— (2.12) 
a’(a + B) 
and the correlation coefficient becomes 
p= A : (2.13) 





~ Val? + ap + B? 


It is also of interest to note that in general —}<p<+1. The correlation co- 
efficient approaches +1 when a’ and 8’; physically speaking, this cor- 
responds to the case where the two-component system cannot function if either 
component fails. The correlation coefficient approaches —34 when a=8 and 
a’—0 and 6’—0; physically speaking, this corresponds to the case where either 
component becomes “almost infallible” as soon as the other one fails. This would 
not be a very realistic situation. Note that the two limiting cases, p= +1 and 
p= —4 are excluded under the assumptions of the model. 

Another case which is of special interest arises when a=a’. As can be seen 
from (2.5), the marginal density of X becomes 


f(z) = ac 


which is identical with (1.1). If a=a’ the marginal density of X is the same as 
that of X*, but this does not provide any information about the independence 
of the life of A from that of B. It does not follow that f(z! y) =f(z) in this case, 
as might have been expected. Analogous arguments apply, of course, to the 
case where B=8’. 








A BIVARIATE EXPONENTIAL DISTRIBUTION 975 
3. ESTIMATES OF THE PARAMETERS OF THE MODEL 


To estimate the parameters a, 8, a’, and 8’ by the method of maximum 
likelihood, suppose that in a random sample of size n from a population having 
the bivariate density (1.9) the A component fails first r times and the B com- 
ponent fails first n—r times. Furthermore, let us write the sum of the lifetimes 
of the A components which failed first as }>z, the sum of the lifetimes of the 
corresponding B components as >-y, the sum of the lifetimes of the B com- 
ponents which failed first as >>’, and the sum of the lifetimes of the cor- 
responding A components as ),’z. Using this notation, the likelihood function 
of the sample becomes 


Zz, = (aB’) "(a’B) n—Te- (a+B—B’) Z2—B’ Ly—a' 2’2z—(atp—a’)Z'y (3. 1) 


Note that when r=0, L is not a function of 8’ and this parameter cannot be 
estimated. Similarly, when r=n, L is not a function of @’ and this parameter 
cannot be estimated. 

Differentiating In L partially with respect to a, 8, a’, and 8’, equating these 
partial derivatives to 0, and solving for the four parameters, we obtain the fol- 
lowing set of simultaneous maximum likelihood estimates: 


ee Ae RD. 
Det Dd'y Let D'y (3.2) 


n—?r r 


_ 
a= 


a’ = 


yy. =. - eager 
bt 2e9 Lv- 22 

To study some of the distribution theory connected with these estimators, 
it is important to note that the formulas for & and # can also be written 


r . n—r 


a _ — and B = — e (3.3) 


po Ug Zz; Ug 
i=1 t=1 
Here u;=min (2;, y,), and it can easily be seen that the u,; are values assumed 
by independent random variables having the exponential distribution 


f(us) = (a + Ble trus (3.4) 


for i=1, 2, - - - , and n. Since the probability that an A component fails before 
the corresponding B component is a/(a+8), it follows that @ is the ratio of 
values assumed by a random variable having a binomial distribution with the 
parameters n and a/(a+ 8) and a random variable having a gamma distribu- 
tion, the n-fold convolution of the exponential distribution (3.4). Since these 
two random variables are, furthermore, independent, it can be shown that 


n 
——a (3.5) 


n—1 
nalan + B(n — 1)] 


var (a) = @-Din—2) (3.6) 














976 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


and higher moments can also readily be obtained. Interchanging a@ and @ in 
(3.5) and (3.6) yields the corresponding mean and variance of the maximum 
likelihood estimator of 8. 

So far as the estimation of a’ and 8’ is concerned, it is important to note that 
the formulas for @’ and §’ can also be written 


i, n-—rf? Fa r (3 7) 
&= —— |= oO. 
>’ (x — y) z= (y — 2) 








where =(y—2) extends over the cases where the A component fails first while 
~’(x—y) extends over the cases where the B component fails first. When the A 
component fails first, y—z is a value assumed by a random variable having the 
exponential distribution (1.8). Hence, 8’ is the ratio of r to a value assumed by 
a random variable having a gamma distribution, the r-fold convolution of the 
exponential distribution (1.8), and, keeping r fixed, it can be shown that 


™ 
—— fp’ forr > 1 (3.8) 
r—1 


E,(8’) 


- rg’? 
var, (8’) = —— —— for r > 2. (3.9) 
(r — 1)2(r — 2) 








Higher moments can also readily be obtained. (The subscript r is used here to 
indicate that r is held fixed.) Substituting n—r for r and a’ for 6’, (3.8) and 
(3.9) yield the corresponding mean and variance of the maximum likelihood 
estimator of a’. 

It is also of interest to note that since @’ and §’ are the reciprocals of means 
of random samples of size n—r and r from populations having, respectively, the 
exponential distributions (1.7) and (1.8), 


1 ] 1 ] 
E(—) =— and E£ (=) =—- (3.10) 
a’ a!’ p’ p’ 


These expectations are unconditional, that is, r is not being held fixed. Exclud- 
ing the case where r=0 for the estimation of 8’ and the case where r= for the 
estimation of a’, one can use the methods developed by Mendenhall and Leh- 
mann [2] to obtain the following asymptotic expressions for var (1/@’) and 


var (1/8’): 
1 1 a 
var (.) = — (1 + <) + O0(n-*) (3.11) 
a’ na’? B 


1 1 8 
var (=) = —— (1 + “) + O(n-*). (3.12) 
B’ np’? a 


These methods can also be used to obtain asymptotic expressions for the mean 
and the variance of the distribution of 1/@ and the distribution of 1/8. For 
1/& they are 


E (—) = ie + 0(n-) (3.13) 





A BIVARIATE EXPONENTIAL DISTRIBUTION 977 


1 1 B 
var (—) = — (1 +- -) + 0(n-*) (3.14) 
a na? a 


where r =0 is again excluded. (In other werds, r is treated as a positive binomial 
variate with parameters n and a/(a+8).) These results are important inas- 
much as the reciprocals of the original parameters are analogous to the means 
of exponential distributions. 


REFERENCES 
Gumbel, E. J., “Bivariate Exponential Distributions,” Journal of the American Statis- 
tical Association, 55 (1960), 698-707. 
Mendenhall, W. and Lehmann, E. H., “An Approximation to the Negative Moment 
of the Positive Binomial Useful in Life Testing,” Technometrics, 2 (1960), 227-42. 











ON THE RESOLUTION OF STATISTICAL HYPOTHESES 


Rosert V. Hoae 
University of Iowa 


Let wo be the space of a parameter 6. Let w; be a subset of wi-1, 
i=1, 2,---, k. We test Sew, against Oewo—w, by testing iteratively 
the following hypotheses: dew; against @ewi_1—wi, t= 1, 2,--+-, k. The 
hypothesis dew, is accepted if and only if all of the intermediate hy- 
potheses are accepted. If the test statistic for each intermediate hy- 
pothesis is based on the corresponding likelihood ratio \;, we demon- 
strate why, under fairly general conditions, these test statistics are 
mutually stochastically independent. This argument is based on an 
independence theorem which deals with complete sufficient statistics. 
A number of illustrative examples are given; these include the equality 
of means and variances, the analysis of variance, the independence of p 
variates, and a regression problem. 


1. INTRODUCTION 


n 1928 J. Neyman and E. S. Pearson [8] proposed, as a test criterion, the 
likelihood ratio. Since that time many statisticians have considered special 

cases of a type of resolution of a compound statistical hypothesis that is dis- 
cussed in this paper. Some of these earlier considerations were by Neyman and 
Pearson [9, 10], Pearson and Wilks [11], and Sukhatme [14]. More recently, 
Roy and Bargmann [12] used a step down procedure to test the overall inde- 
pendence among p variates having a multivariate normal distribution; this in- 
teresting procedure is actually a special case of the type of resolution examined 
here (see Example 3.3). In each of these investigations, it was found that the 
test of the compound hypothesis under consideration can be based upon two, 
or more, stochastically independent statistics. It is the purpose of this paper to 
demonstrate why, with this type of resolution and under fairly general condi- 
tions, we always have the independence of the test statistics. In addition, we 
illustrate this discussion with several examples which, we hope, have their in- 
teresting aspects. 

Before we formulate the general problem, let us begin with an example which 
is a well known resolution, reference [9], of the hypothesis of the equality of two 
independent normal distributions with means »; and yp, and variances o;? and 
o2*, respectively. That is, if the total parameter space for @= (1, ue, 017, 02”) is 


es { (uss, m2, 01, 02); —-2x << 0,0< oi, i= 1, 2} 
and if the restricted parameter space is 
o= { (us, 2, 01, 02); —2 <1 =u < 0, 0< o1 = o2}, 
we wish to test 
Ho: 0e against Hy: 62 — w. 
One possible statistic on which to base this test is the likelihood ratio 
L(@) 
= Ta’ 
978 








RESOLUTION OF STATISTICAL HYPOTHESES 979 


where L is the likelihood function (the joint probability density function of the 
items of the random samples from these two distributions) and where L(6) 
and L(Q) represent, respectively, the maxima of L in w and 9. The null hy- 
pothesis H, is rejected if the computed likelihood ratio \<Ao, a given constant, 
or if —2 In A> —2 In Ao=c. The significance level of such a test can be deter- 
mined by considering the distribution of —2 Ind. Of course, from Wald [16] 
and Wilks’ Theorem [17], we know that this distribution is approximately 
chi-square with two degrees of freedom, provided the sample sizes are large. 

Suppose, however, the null hypothesis Ho is rejected. Frequently, this deci- 
sion is not wholly satisfactory to the experimenter because he would like to 
know why H, was rejected. Are the variances unequal, are the means unequal, 
or both? Hence, in practice, the test is usually carried out by first testing the 
equality of variances and then, if that hypothesis is accepted, by testing the 
equality of means assuming that the variances are equal. That is, if we denote 
Q by wo and w by w, and let 


2 2 . ; 9 2 
“4 = { (1, M2) 01, 02); —- < wi < o,7~= 1,2,0 <0: =o} 


we first test 
Ho: dew, against H i Bewo — wr 
and then, if H' is accepted, test 
2 , 2 
Ho: 0ew. against Hy: dew, — wr. 


We reject H, if either H,' or H,? is rejected. In practice, these tests are based on 
an F statistic and a ¢ statistic, respectively. If o:?=0,?, these statistics are 
stochastically independent so that if each test is performed at the significance 
level a=.01, say, the over-all significance level for Ho against H, is equal to 
1—(.99)?=.02, approximately. 

Before we turn to the general problem, let us note two things about this ex- 
ample. First, if Ho! is rejected we would not test Hy? against H,*. However, 
some experimenters at this point desire also a comparison of the means (even 
though H, has been rejected). For this purpose, they frequently use some sta- 
tistic that has been suggested as a solution to the Behrens-Fisher problem. 
However, such a statistic is stochastically dependent on the F statistic that is 
used to test the equality of variances and, more importantly, this Behrens- 
Fisher test is actually suggested by the data. Consequently, there is some ques- 
tion as to the meaning of any attached probability statement. Accordingly, 
under t*ese conditions, the analysis of the difference of the means is treated as 
a guiding investigation (possibly a preliminary investigation to some future ex- 
periment). Second, let us note, in this example, that 


na) ~ Lnaall ze] 


or, if \, is the likelihood ratio for testing Ho‘ against H,‘,i=1, 2, then 














AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


N= and —2InA = D> (—2Ind). 


t=1 


Since here the statistics —2 In \; and —2 In 2, are functions of the F and ¢ 
statistics, respectively, they are stochastically independent and the tests based 
on them are essentially equivalent to those based on the F and ¢ statistics. 

We now proceed to a formulation of the general problem. Let Q=w» be the 
total parameter space of the parameter 0, which is usually a vector. Suppose w 
is a subset of Q and it is desired to test Ho:dew against H,:0e2—w. Let us say 
that there are certain intermediate hypotheses in which the investigator is in- 
terested; that is, suppose the null hypothesis can be resolved as follows. Let 
w; be a subset of wit, i=1, 2,---, k, so that 


Q = wD wD w2 D +++ Do = ®, 


where each w; corresponds to an intermediate hypothesis. Let us test H against 
H, by testing iteratively the following hypotheses: 


Hi: Gew; against Hi: Gews_1 — Wi, 


i=1,2,---+,k. The test of Ho‘ is not made unless, of course, H)* is accepted. 
Then H, is accepted if and only if all of Hy’, - - - , Ho* are accepted. If, here, L 
denotes the likelihood function and if L(4,) represents the maximum of L in 
w;, t=0,1,---, k, then 


L(a; 
i= ( P ’ ‘= i. 2, *, 
L(@;-1) 


is the likelihood ratio for testing Ho‘ against H,‘. Thus the likelihood ratio for 
testing H» against H, is 


y= = | = Ts 


Pee Bite me 


k 
—2Ind = >> (—21nd)). 


tl 


Before the question of independence is examined, a few observations should 
be made at this point. This type of resolution is certainly not unique, and dif- 
ferent resolutions could possibly lead to different decisions. In many instances, 
however, there is a natural way to effect the resolution. In other cases, the res- 
olution can be made so as to test first the more important intermediate hy- 
potheses since the tests stop once any //‘ is rejected. But, in any event, if the 
compound hypothesis is resolved with some care, this sequence of tests can fre- 
quently be based on well known statistics which are mutually stochastically 
independent. These remarks are illustrated later. 





RESOLUTION OF STATISTICAL HYPOTHESES 981 


2. INDEPENDENCE 


The notation used here is the same as that of the preceding section. Let us 
consider the likelihood ratio \,; for testing Ho‘ against H;‘, i=1, 2,---, k. 
Frequently, if @ew;, the distribution of \; does not depend upon certain param- 
eters that are classified as nuisance parameters with respect to the hypothesis 
H,‘. For instance, in the introductory example in which we test Ho':0,?=02? 
against H,':0,202?, we find that 1, or the corresponding F statistic, has a 
distribution under H,' that does not depend upon the nuisance parameters, 
namely, the two means y; and ye and the common variance o;?=o,?. In general, 
let us assume further that, under Ho‘: 6ew;, there exist complete sufficient statis- 
tics, reference [7], for these nuisance parameters. Thus, in accordance with an 
Independence Theorem due to Basu [1], Hogg and Craig [4], \; is stochastically 
independent of these complete sufficient statistics. Usually the likelihood ratios 
Aint, + +, Ax, Which follow \;, are functions of these complete sufficient statis- 
tics and hence are stochastically independent of \;, 7=1, 2, ---, k—1. Thus, 
if these conditions are satisfied, the likelihood ratios \u, \2, - - « , Ax are mutually 
stochastically independent. To illustrate this, let us again return to the intro- 
ductory example. There 2, or the corresponding ¢ statistic, is a function of the 
complete sufficient statistics for u1, ue, and o;?=02?; that is, \: is a function of 
the two sample means and the estimate of the common variance. Hence gz is 
stochastically independent of \; (and thus ¢ of F). 

Thus, if certain conditions exist, we can use the Independence Theorem to 
establish the mutual independence of \;, \e, - - - , Ax. Further suppose that the 
test of Hy‘ against H,‘is made at the significance level a;, i=1, 2, - - - , k. Since 
the overall hypothesis Hy is accepted if, and only if, all of Ho’, - --, Ho* are 
accepted, the test of Hy against H, has the significance level 1—I}_,(1—a,) or 
1—(l—a)* if a=a= --+ =a,=a. 


3. EXAMPLES 


We illustrate the preceding discussion by the following examples. 


Example 3.1. The Equality of m Independent Normal Distributions. 


Let 1, +++, um and a;?,-++, om? be, respectively, the means and vari- 
ances of m independent normal distributions. Let 2=wo be the 2m-dimensional 
total parameter space. Let w=w. be that two-dimensional subspace of © that 
specifies the equality of the m means and the equality of the m variances. Let 
w, be that (m+1)-dimensional subspace of 2 that specifies the equality of the 
m variances but in no way restricts the means. If a random sample of size n 
is taken from each of these m distributions and if \:=LZ(4:)/L(4o) and 
Ao = L(G2)/L(41), then it is well known that —2 In A; and —2 In X, are, respec- 
tively, functions of Bartlett’s test statistic and a certain F statistic. If the 
distribution variances are equal, it is easy to show, using the Independence 
Theorem, that these two statistics are stochastically independent. Suppose, 
however, the investigator wants to consider some additional intermediate hy- 
potheses. For instance, he may insert the following subspaces between wo 
and a: 

Q = wo = Wor D wo2 D wos D * * * D Wom = 1, 








982 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


where the subspace wo; specifies the equality of the first i variances, 7=2, - - - , 
m. Then, if Vi, - + +, Vm are the sample variances, the likelihood ratios 


Gu) 
L(@o,:-1) 


07 


are functions of 
V; 
R; = 
i—1 
DV; 


j=1 


, | fas ae, 


respectively. Each R,, i=2, - ++, m, multiplied by an appropriate constant, 
has an F distribution when o;?= - - - =a,,”. Moreover, Rz has a distribution 
which does not depend upon o;?= 07, a3, « + - , om", 41, ua, * * *, OF tm. Thus Rz is 
stochastically independent of the complete sufficient statistics for these 2m—1 
parameters. Since R;,---, R, are functions of these complete sufficient sta- 
tistics, they are independent of R;. By a similar argument, we can show that 
Ry, -- +, R» are stochastically independent of R;; and so on. Hence the ratios 
Ro, -- +, R» are mutually stochastically independent under wom =. Thus us- 
ing this particular resolution of the equality of the distribution variances, the 
Bartlett test can be replaced by a succession of independent F tests which, if the 
equality of the variances is rejected, indicates one place where the inequality 
may exist. Of course, here, the distributions can be ordered, prior to observing 
the data, so as to test first the more important intermediate hypotheses. 

A similar resolution of the hypothesis of the equality of the means can be ef- 
fected. That is, insert the following subspaces between w and ws: 


@1 = 011 D 12 D 13 D ++ * 2 Om = @2, 


where the subspace w), specifies, in addition to the equality of the variances, the 
equality of the first i means, i=2, ---,m. Then, if We, Ws,---, W, are the 
Helmert statistics 

W. = [X, = X2|/V2 


Ws = [X.1+ X2 — 2X3]/V6 


We = [Rit +--+ ni — (m— 1)Xn]/Vmtn—1), 
where X;, - - - , X» are the sample means, the likelihood ratios 


L(@1:) 
Aig = — > 
L(41s-1) 


=2,---,mM, 


are functions of 


W. 
S; => a 3 ’ 
~Vi+ DW; 


j=l j=? 





RESOLUTION OF STATISTICAL HYPOTHESES 


where we take 


i-1 2 
> W; =0 


j=2 


when i1=2. Each S;,i=2, - - - , m, multiplied by an appropriate constant, has 
an F distribution under w:,=w:. Moreover, it is easy to show, using either the 
Independence Theorem or a change of variable technique, that Rz, ---, Rm, 
So,---, S» are mutually stochastically independent under win=w2. Thus, 
with this resolution, the standard test for the equality of means can be re- 
placed by a succession of independent F tests. 

It is not the purpose of this paper to compare this procedure for testing the 
equality of means with the multiple comparison methods of Scheffé [13] and 
of Tukey [15]. We simply want to outline here a method that yields inde- 
pendent test statistics. However, if the statistician knew, prior to performing 
the experiment, the precise sequence wy, wi2, ++, wim Of interest to him, it 
may be that the above procedure compares favorably with those multiple 
comparison methods. A similar comment can be made about the comparison 
of our proposal for testing the equality of variances and Hartley’s maximum 
F ratio test [2]. 


Example 3.2. The Two Factor Analysis of Variance 

Let us consider the usual two factor analysis of variance having a model 
with fixed effects and with independently normally distributed observations 
having common unknown variance. In this example, we take an equal num- 
ber (greater than one) of observations in each cell. Let @ denote the total 
parameter space and w denote the subspace specifying the equality of all of the 
cell means. Let w; be that subspace that specifies that the interactions are 
equal to zero; let w, be that subspace that specifies that the row means are 
equal and the interactions are zero; and let w; be that subspace that specifies 
that the column means are equal, the row means are equal, and the interac- 
tions are zero. Of course, 


Q = wo D w1 D w2 D ws = w. 


Let Q, Q:, Qe, Qs, and Q; represent the quadratic forms commonly associated 
with the sources “total,” “within” (or “error”), “interaction,” “row,” and 
“column,” respectively, so that 


Q = Qi1+ Q2.+ Qs + Q. 


Then the hypothesis as resolved above can be tested by a succession of three F 
tests based on the ratios 


Q:2 Q; ar 
ay scons ’ and ———————_ - 
01 Qi + Qe Qi+Q2+Q; 


Note that the common practice is to use tests based on the ratios 


Qs =( Q; ) S*( Qs ) 
—», —l{or— , and —T{or——}. 
Q: Q: Q:+ Q2 Q1 Q:+Q2 


(1) 








984 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


While the modification here is slight, it is of interest since it is easy to prove, 
using the change of variable technique or the Independence Theorem, that the 
ratios in (1) are mutually stochastically independent, provided the null hypoth- 
esis is true, while the ratios in (2) are not mutually stochastically independ- 
ent. Thus, using the ratios in (1), it is extremely easy to determine an over-all 
significance level for testing w against Q—w. Finally, the following observation 
should be made. While it is quite natural to test first the intermediate hypoth- 
esis that the interactions are zero, the order of the “row” and “column” tests 
can be determined by the relative importance of the corresponding hypotheses. 


Example 3.3. The Independence of p Variates, Roy and Bargmann [12] 


Let p random variables have a non-singular multivariate normal distribu- 
tion with unknown means and unknown covariance matrix 


rn. 8m * °.* oe 


O12 22° * * 2p 


\Tip Gap ° * * Opp 


Let 2 denote the total parameter space and w denote the subspace specifying 
that oi;=0, i~j. Let w: be that subspace of Q=w that requires that o1;=0, 
}=2, +++, p; let w, be that subspace of w, that requires o2.;=0, j7=3, +--+, p; 
and so on until wy; is that subspace of wy» that requires o,-1, ,=0. Here, of 
course, 


Q = wD wD 2D +++ 2D wp = w. 


It is well known that A; =LZ(@:)/L(@o) is a monotone function of the multiple 
correlation coefficient r:.2,3,...,». Similarly we find that 
phen an o t=2,---,p-—1, 
L(&i-1) 

are, respectively, monotone functions of the multiple correlation coefficients 
T2.3,4,--+,p) ** *y T(p-1)-p- Roy and Bargmann [12] call these multiple correla- 
tions coefficients the step-down correlations, and it is easy to show, using 
standard multivariate techniques, that they are mutually stochastically in- 
dependent under the null hypothesis. It is interesting to note that the In- 
dependence Theorem can be used here too. For example, ri.2,3,...,» has a dis- 
tribution under w, that does not depend upon oy or oi;, 1, j=2, +--+, p. The 
other step-down correlations are functions of the complete sufficient statistics 
for these parameters; hence re.s4,...,p, * * *,7(p-1)-p are stochastically independ- 
ent of ri.23,....» If this process is continued, we can show that r1.23...,», 
T2.3,4,-++,p) °° * 5 T(p-t)-p are mutually stochastically independent. 


Example 3.4. A Regression Problem 


Let us assume that X;; has a normal distribution with unknown variance 
o? and mean bo+bidi(z;) +b2G2(z;) +bas(2:), = 1, 2,---, kh; f=1, 2,--+, 0. 





RESOLUTION OF STATISTICAL HYPOTHESES ‘ 985 


Here the 2, - - « 2, are known equally spaced constants; ¢;(z) is the orthogonal 
polynomial of ith degree, 7=1, 2, 3; and bo, bi, be, bs are unknown parameters. 
We are frequently interested in testing w:b:=0, b.=0, bs =0 against all alter- 
natives. However, if this hypothesis is rejected, we usually want to know the 
degree of the mean polynomial. This suggests the resolution in which w is that 
subspace of the total parameter space that requires b;=0, in which aw, is that 
subspace of a, that requires b,=0, and ir which w;=w. Let 


k n 
DD (XG — X= A+++ %, 
t=-1 j=l 

where Q;, Qe, Qs, and Qs represent the quadratic forms commonly associated 
with the sources “linear,” “quadratic,” “cubic,” and “error.” Then the hypoth- 
esis, as resolved above, can be tested by a succession of three F tests based 
on the ratios 


Q: Qe Q1 
—) — » and ———-- 
Qs Q3+ Q Q2+Q3+Q 


If b; = b,=b;=0, these ratios are mutually stochastically independent. 

In practice, some statisticians use a procedure that is almost the reverse of 
the one proposed here; that is, the test begins with the linear term and con- 
tinues as long as a sequence of significant F values is obtained. However, the 
F statistics in that sequence are not mutually independent, and thus it is 
difficult to compute any joint probability. As a matter of fact, in such a proce- 
dure, the over-all hypothesis seems somewhat vague. 

One possible objection to the resolution proposed in this example is that it 
seems difficult to know whether the test should start with b;:=0, with b.=0, 
or, for that matter, with bs=0. But this does not seem to be a valid objection, 
for even if the statistician follows the procedure that is outlined in many books, 
he is not justified in proceeding beyond those b values for which he has “specific 
rationale.” This could determine the b value with which we start our test. 

In each of the four preceding examples, complete sufficient statistics for the 
parameters exist; and mutual independence is established using the Independ- 
ence Theorem. While it is not entirely clear how we would proceed under 
other conditions, the following suggests a possible procedure. Under sufficient 
regularity conditions, Wald’s proof [16] of Wilks’ Theorem [17] demonstrates 
that —2 In A, —2 In \y, - ++, and —2 In i, are, respectively, asymptotically 
equivalent to certain quadratic forms, in normally distributed variables, that 
have chi-square distributions. These facts, together with the identity 


k 
—2Indy = >> (-2Ind,), 


t==1 


and a theorem of Hogg and Craig [5], suggest that possibly —21n \y, ---, —2 
In \y are, asymptotically, mutually stochastically independent. If this con- 
jecture were shown to be correct, we then can easily compute an approximate 
significance level for testing Hy) against H, when this type of resolution is used. 

In many non-regular cases, complete sufficient statistics exist, and again we 








986 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


can use the Independence Theorem. However, if complete sufficient statistics 
for the non-regular parameters do not exist, Example 4.3 of the next section 
is illustrative of the fact that in these non-regular cases the test statistics are 
not necessarily stochastically independent. 


4. NON-REGULAR CASES 


This section contains three examples; in the first two of these the test statis- 
tics are stochastically independent and in the last example the test statistics 
are stochastically dependent. The first two examples also emphasize that non- 
regular parameters are weighted twice as much as regular parameters in com- 
puting the degrees of freedom of —2 In \ provided this random variable has 
an exact or limiting chi-square distribution. In addition, Example 4.3 gives 
an illustration of a random variable that has a distribution which is a com- 
bination of a discrete-type distribution and a continuous-type distribution. 


Example 4.1. The Equality of Ranges of Certain Independent Rectangular Dis- 
tributions 

Let us consider m independent rectangular distributions over the intervals 
(0, 0;), t=1, +--+, m. Let Q=we be the m-dimensional parameter space. Let 
w=we be the zero-dimensional subspace which specifies that each 0; is equal a 
given number, say 0. Let us resolve this hypothesis by letting w; be the one- 
dimensional subspace which requires that 0,= --- =0,. If Xii+-+, Xni 
are the order statistics of a random sample of size n from the ith distribution, 


i=1,-+-,m, then 
X,.) 
tay AE 


— | max x | 





L(a) [ma : 0) | 4 


L(@1) 80" 





g = 


If = -+- =n, the Independence ieee implie: that the statistics \; and 
2 are seniiaaihadies independent. According to a theorem of Hogg [3], —2 In 
A, and —2 In ), have, respectively, exact chi-square «siributions with 2(m— 1) 
and 2 degrees of freedom under a». 


Example 4.2. The Parameters of an Exponential Distribution 
Let a distribution have the probability density function 


x-9@0 
. G@<Zz< OQ, 


1 
f(z; 8, p) = —exp| - 
p 


p 


zero elsewhere, where p>O. Let 2=wo be the two-dimensional total parameter 





RESOLUTION OF STATISTICAL HYPOTHESES 987 


space. Let w=w, be the zero-dimensional subspace which specifies that the 
parameters equal the given constants @ and po, respectively. Let us resolve this 
hypothesis by letting w; be that one-dimensional subspace which requires 
p=po. If Xi, - - + , X, are the order statistics of a random sample of size n from 
this exponential distribution, then 


> (X; — X) D (X; — X) 
i=! : REA en. exp 


\ = = 


@) i=l 
hs i +n 
Npo Po 





n(X; — 4) 
Ae = exp |-~ |. 


Po 


Moreover, if p=po, it follows from the Independence Theorem that A; and dz 
are stochastically independent. The limiting distribution of —2 In \, is, pro- 
vided p = po, chi-square with one degree of freedom in accordance with a theorem 
of Jones [6]. If the parameters are restricted to w, —2 In \: has an exact chi- 
square distribution with two degrees of freedom, Hogg [3]. 


Example 4.3. The Equality of Two Independent Rectangular Distributions 


Let us consider two independent rectangular distributions over the intervals 
(0:—p;:, 0:+p;), t=1, 2. Let Q=wo be the four-dimensional total parameter 
space. Let w=w: be the two-dimensional subspace which specifies that 6:=62 
and p;=p2. We can resolve this hypothesis by inserting a three-dimensional 
subspace w, which requires that p:=p2. Let X1,---, X, and Yi,---, Ya 
be the order statistics of two random samples, one from each distribution. 
Here the likelihood ratios are 


x = Xa = Fa = Ya)" 
' [max (X, — Xi, Yn — Yi) ]™ 





[max (X, — Xi, Y. — Y,)]™ 
*~ [max (Xa, Yn.) — min (Xi, Y:)]* 





However, despite this type of resolution of the hypothesis, di and d2 are not 
stochastically independent. The limiting distribution function of —2 In \y 
is, provided p:=p2, one-half the sum of two distribution functions: one chi- 
square with two degrees of freedom and the other chi-square with four degrees 
of freedom. If the parameters are restricted to w, the limiting distribution func- 
tion of —2 In )z is one-half the sum of two distribution functions: one degen- 
erate at zero and the other chi-square with two degrees of freedom. Incidentally, 
here, 


—2In\X => —2Ind\, — 2InA, 
has a limiting chi-square distribution with four degrees of freedom. While 








988 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


these results are not difficult to obtain, the derivation of one of these limiting 
distributions, namely, that of —2 In dg, is at least outlined here. The statistics 
[max (X,, Y,)] and [min (X,, Y;)] are joint complete sufficient statistics for 
6, = 42 and p1=p2. The distribution of \, does not depend upon either of these 
parameters. Hence (i) dz: is stochastically independent of V=max (Xn, Y,) 
—min (X,, Y;) and (ii) we may assume that the samples are taken from the 
rectangular distribution over the unit interval (0, 1) without changing the dis- 
tribution of —2 In \». The characteristic function of —2 In d, is found by writ- 
ing 
AV = U*™ or —2lndA, — 2 V™*" = — 2ln U*™, 


where U=max (X,—X,, Y,—Y1). The stochastic independence of \z and V 
then implies that 


Elexp (—2it In \2)]E[V-*] = B[U-4]. 


Now 


1 
E[V-4] -f y~titn(2n)(2n — 1)v?"-2(1 — v)dv 
0 
and 
1 u 
E[U-*] -f f u~*itm2n?2(n — 1)2u™-2w"-2(1 — u)(1 — w)dwdu. 
0 0 


Once these rather simple integrals are evaluated, it is easy to show that 


lim E[V-4] = (1 — 2it)-2 


n+ 2 


and 
lim E[U-*] = 4(1 — 2it)-? + 4(1 — 2it)-3; 


n-+o 
consequently, 


lim Elexp (—2it In \2)] = 3 + 3(1 — 2it)—, 


the desired result. 


REFERENCES 


[1] Basu, D., “On statistics independent of a complete sufficient statistic,” Sankhy@ 
15 (1955), 377-80. 

[2] Hartley, H. O., “The maximum F-ratio as a short-cut test for heterogeneity of vari- 
ance,” Biometrika, 37 (1950), 308-12. 

[3] Hogg, Robert V., “On the distribution of the likelihood ratio,” Annals of Mathe- 
matical Statistics, 27 (1956), 529-32. 

[4] Hogg, Robert V. and Craig, Allen T., “Sufficient statistics in elementary distribution 
theory,” Sankhyd, 17 (1956), 209-16. 

[5] Hogg, Robert V. and Craig, Allen T., “On the decomposition of certain chi-square 
variables,” Annals of Mathematical Statistics, 29 (1958), 608-10. 








RESOLUTION OF STATISTICAL HYPOTHESES 989 


[6] Jones, Donald A., On the Limiting Distribution of —2 In d in the Non-Regular Case. 
(Unpublished thesis.) University of Iowa, 1959. 
[7] Lehmann, FE. L. and Scheffé, Henry, “Completeness, similar regions, and unbiased 
estimation,” Sankhyd, 10 (1950), 305-40. 
[8] Neyman, J. and Pearson, E. S., “On the use and interpretation of certain test criteria 
for purposes of statistical inference,” Biometrika, 20 (1928), 175-240 and 263-94. 
[9] Neyman, J. and Pearson, E. 8., “On the problem of two samples,” Bulletin de l’ Acad- 
émie Polonaise des Sciences et des Letires, Série A (1930), 73-96. 
[10] Neyman, J. and Pearson, E. 8., “On the problem of k samples,” Bulletin de l Acad- 
émie Polonaise des Sciences et des Lettres, Série A (1931), 460-81. 
{11] Pearson, E. 8. and Wilks, S. S., “Methods of statistical analysis appropriate for k 
samples of two variables,” Biometrika, 25 (1933), 353-78. 
[12] Roy, 8. N. and Bargmann, R. E., “Tests of multiple independence and the associated 
confidence bounds,” Annals of Mathematical Statistics, 29 (1958), 491-503. 
{13] Scheffé, Henry, “A method for judging all contrasts in the analysis of variance,” 
Biometrika, 40 (1953), 87-104. 
[14] Sukhatme, P. V., “On the analysis of k samples from exponential populations with 
especial reference to the problem of random intervals,” Siatistical Research Memoirs, 
1 (1936), 94-112. 
[15] Tukey, John W., The Problem of Multiple Comparisons. Princeton University, 1953. 
[16] Wald, Abraham, “Tests of statistical hypotheses concerning several parameters 
when the number of observations is large,” Transactions of the American Mathe- 
matical Society, 54 (1943), 426-82. 
[17] Wilks, 8. S., “The large-sample distribution of the likelihood ratio for testing com- 
posite hypotheses,” Annals of Mathematical Statistics, 9 (1938), 60-2. 





THE ASYMPTOTIC VARIANCES OF METHOD OF MOMENTS 
ESTIMATES OF THE PARAMETERS OF THE TRUNCATED 
BINOMIAL AND NEGATIVE BINOMIAL DISTRIBUTIONS 


S. M. Suan 
Gujarat University, Ahmedabad, India 


In this paper the asymptotic variances of method of moments es- 
timates of the parameters of the truncated binomial and negative bi- 
nomial distributions have been derived. For the binomial distribution 
with one class from the lower end of the distribution truncated, the 
efficiency of the estimate relative to maximum likelihood has been 
compared. 


1. INTRODUCTION 


iDER [3] has obtained the estimates of the parameters of the truncated 

binomial and negative- binomial distributions by method of moments. 
But estimates without some knowledge of their distributions are of little use. 
In this note the asymptotic variances of these estimates have been given. In 
the case of the binomial distribution with one class from the lower end of the 
distribution truncated, the efficiency of this estimate relative to maximum like- 
lihood has been considered. 


2. TRUNCATED BINOMIAL DISTRIBUTION 


Consider the binomial distribution in which k classes from the lower end 
of the distribution are truncated. The probability law of this trunacted distri- 


bution is 
n 
1s allied 
x 


SS re ’ 
Zz. ( ) “? 
z=k zr 


Rider [3] has shown how to estimate p from a sample taken from the above 
truncated binomial distribution by method of moments. His estimate is 


T. — kT; 
(n — 1)T: — (k — 1)nTo 








p= 


where 


To = Zhe T; = DL ths, T; = > x*-f, 
z=k rank z=k 


and f, being the observed frequency corresponding to z in the sample. 
Let us define the sample moments about the origin by 


n 


: xfs 
a, = : as 


To 
990 





VARIANCE OF ESTIMATES FOR TRUNCATED BINOMIAL DISTRIBUTIONS 


while, the corresponding moments for the truncated distribution by 
a, = » oe L"* Dz. 
z=k 


Then, / can be written as 
Ro ka, 
(n — 1)a; — n(k — 1) 





p= (1) 
p is a function of sample moments. That the estimate is consistent follows from 
the application of Slutsky’s theorem (Cramér [1], §20.6) and that it is asymp- 
totically normally distributed as T)>—* can be seen by applying the theorem 
(Cramér [1], §28.4). The asymptotic variance of # has been worked out in 
accordance with the theorem Cramér [1], §27.3.3) and is given below in the 
matrix form. 


B’ Aw»B 


V(b) =; (2) 


T (a2 = ka)? 


—a@ a 
Cr ee 
Pp G21, G22 


where, 


a=k+(n—1)p and Ary = (Gre — Or'Q). 


For illustrating these results, let us consider the data of Table 1 (taken from 
Rider’s paper [3]). 


TABLE 1. NUMBER OF BOYS IN FAMILIES HAVING EIGHT CHILDREN 








No. of Boys No. of Families 





215 
1,485 
5,331 

10 ,649 
14,959 
11,929 
6,678 
2,029 

342 


OCNAOork WBNS © 








53 ,680 





Suppose one class from lower end of the distribution is truncated, i.e. k=1. 
Then, we have 


To = 53465, a 4.1340, a2 = 19.097 
a; = 95.675, a, = 510.8555. 





992 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 
Using formulas (1) and (2) for k=1, we obtain 
pf = 0.51707, Vp) = 7691 X 10-. 
3. COMPARISON WITH MAXIMUM LIKELIHOOD ESTIMATE 


For sake of simplicity, we consider the binomial distribution having one class 
from the lower end of the distribution truncated. The maximum likelihood esti- 
mate p* of p based on a sample of size 7'y is given by Finney, D. J., [2]. 


(3) 


where a; is the first sample moment about the origin. The variance of p* is 
given by 


pq*(1 — q")? 
Ton{(1 — g")(np + q) — np] 


For the data of Table 1 and k=1, we obtain 





V(p*) = (4) 


p* = 0.51517, V(p*) = 7053 X 10-*. 


We observe that the efficiency of Rider’s estimate relative to maximum likeli- 
hood is 91.7 per cent. 


4. TRUNCATED NEGATIVE BINOMIAL DISTRIBUTION 


We consider the negative binomial distribution in which one class from the 
lower end of the distribution is truncated, as was done by Rider [3]. The 
probability law of truncated distribution is 


(m+ 2-1)! p* 
ai(m—1)! (l+p)"* 


Ds = , ¢ >t. 
;~O+o- 








Consider a sample of size 7’) from this truncated distribution. Using the 
method of moments, the estimates obtained by Rider [3] are 


T:T, — T:T: + Ti — Ts 
T(T: — T,) 





(5) 


oT: = ToT; ai T3:T 


h = - 
T,T, — TT; + T? — T? 





(6) 


where 7, 72, T; have the same meaning as in the case of truncated binomial 
distribution. Defining a, and a, as before, we see that the estimates are func- 
tions of sample moments and hence are asymptotically normally distributed 
and by applying the same theorem used before, the variance-covariances of the 
estimates are obtained as 





VARIANCE OF ESTIMATES FOR TRUNCATED BINOMIAL DISTRIBUTIONS 


v0) C’Ag;C 
Pn atp? 
D’A33D 
Vim) = os 
ap 
e C’A33D 
Cov (p, m) = , 


aip* 
where, 
[ g/(m + 1) —g'/p Qi1, Qi2, G13 
C =; —h/(m+ 1) |, D= h'/p}, Ass = | G21, G22, Ges], 
l 1/(m + 1) —1/p G31, 32, 3 
h=2b+p4+1,9g =gt+bp, h’ =h+p,9=0?+p4+1, b}=mpt+pt+l. 


As an illustration, let us take the data of Table 2 (taken from Rider’s paper 
[3]). Suppose the class corresponding to x=0 is truncated. Then, we have 
TABLE 2. NUMBER OF YEAS7' CELLS PER SQUARE IN HEMOCYTOMETER 








Frequency 





213 
128 





To=187, T1;=273, T,.=511, T;=1227. Hence, using formulas (5) and (6), we 
obtain 
pb = 0.136608, m = 5.381703. 


The sample moments about the origin are 
a,=1.4599, a2=2.7326, a3=6.5615, ag=19.0962, as=63.5454, as =232.7861. 
Using the estimates for the parameters, we get 
b = 1.8717, g = 4.6399, g’ = 4.8956, 
h = 4.8800, h’ = 5.0166 and 
—4.6399 —4.8956 
C = ——/ —4.8800 |, 5.0166 
6.3817 
— 1.0000 
0.6013, 2.5722, 9.5171 
A33 = esa 2.5722, 11.6291, 45.6154]. 
187 19.5171, 45.6154, 189.7328 


D = 
0.1366 





994 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


Using formulas (7), (8) and (9), we obtain 


V(p) = 0.02062, V(m) = 43,0900, 
Cov (f, m) = — 0.03726. 


The maximum likelihood estimation of the parameters of the truncated nega- 
tive binomial distribution is complicated and the calculations of their variances 
would be difficult. Hence the efficiency of the Rider’s estimates of the param- 
eters of the truncated negative binomial distribution has not been compared 
with maximum likelihood estimates. 


ACKNOWLEDGMENT 


I am extremely grateful to the referee for making useful suggestions in the 
preparation of the paper. 


REFERENCES 


[1] Cramér, H., Mathematical Methods of Statistics. Princeton, New Jersey: Princeton 
University Press, 1951. 

[2] Finney, D. J., “The truncated binomial distribution,” Annals of Eugenices, 14 (1949), 
319-28. 

[3] Rider, Paul R., “Truncated Binomial and Negative Binomial Distributions,” Journal 
of the American Statistical Association, 50 (1955), 877. 





A NOMOGRAPH FOR COMPUTING PARTIAL 
CORRELATION COEFFICIENTS* 


Ruts W. Lees AND FrEpERIc M. Lorp 
Educational Testing Service, Princeton, New Jersey 


HE accompanying nomograph may be used for calculating approximately 
the partial correlation coefficient 


Ti2 — 1131 23 
V/1 om 1137 V/1 = 123° 


DIRECTIONS FOR USING THE NOMOGRAPH 





ms 





A key to the lettering of the scales will be found in the upper left-hand corner 
of the nomograph. The A, C, D, and E scales are marked off at intervals of 
.01. The B scale is marked off at intervals of .02. 

1. Using the A scale, find the curved line for whichever of the two correla- 
tions, 713 OF 723, is larger in absolute value. 

2. Using the B scale, find the curved line for whichever of the two correla- 
tions, riz OF 723, is smaller in absolute value. Use the upper part of the B scale 
if ry; and re; have the same sign; use the lower part if they have different signs. 

3. Find the point where the two curved lines that you have found meet. 
(If the values of r:; and r23 are approximately equal, the point sought in step 
3 will lie close to the point }(ris+-723) on the C or the D scale.) 

4. Find the point ri. on the B scale.. Use the upper part of the scale if rj. is 
positive; use the lower part if riz is negative. 

5. Place a ruler on the nomograph so that it passes through the two points 
found in steps 3 and 4 and crosses the E scale. 

6. Read the value of r;2.3 where the ruler crosses the E scale. 


DISCUSSION 


The reader may wish to try out this procedure with the following examples: 


13 23 Tie 12.3 
.20 .50 .60 59 
04 —.14 15 21 
— .53 .76 .79 —.70 
—.44 —.97 .30 —.58 
.30 .30 —.47 —.62 
It will be seen that whenever interpolation is necessary, or when the angle be- 


tween the ruler and the B scale is small, ri2.3 cannot be determined as accurately 
as otherwise. 





* Enlarged copies of this nomograph and of the nomograph for multiple correlations Lord [1], covered with 
laminated plastic, may be obtained without charge by writing to Evaluation and Advisory Service, Educational 
Testing Service, Princeton, N. J. 


995 








(f °N ‘uoqeoung ‘ao1areg 
BZuljsay, [BUOTwONpY ‘eo1AIeg AIOSIAPY puUv UOTZENTVAY Woy aFivyo ynoyyLm pourezgo 
oq Avul saidoo peZiepuq) ‘quolyeoD uorjee1I0H [VIyIVg 94} Zuryynduro0g 10; ydvaZouron 


Lit4444-44 


44444444 





cond 
s 
_ 
jew} 
| 
—Q 
= 
2 
=) 
2) 
=) 
> 
< 
Z 
=} 
D 
° 
- 
Z 
© 
i) 
< 
— 
s) 
o 
Dn 
n 
a 
4 
< 
Oo 
= 
ica 
2 
& 
< 
- 
79) 
Z 
- 
= 
3 


7 
4 


AME 


$3095 0) Aoy 











NOMOGRAPH FOR PARTIAL CORRELATION COEFFICIENTS 997 


The nomograph contains two vertical scales, the E scale for riz.3 and the B 
scale for riz. The triangle in the nomograph contains a curvilinear coordinate 
system for riz; and res. Any one curve in this system may refer to either ris 
or to res as the user chooses. Each curve is actually a portion of an ellipse. The 
“curve” for rj3 or 723 =0 is the horizontal A scale; the “curve” for ri3 or 723 = 1.00 
coincides with the vertical B scale. The two straight lines C and D, tangent to 
all of the ellipses and forming the upper and lower boundary of the network, 
contain all points for which ri; equals rz; in absolute value. 

The following determinantal equation was used in the construction of the 
nomograph (by methods described in standard texts on nomography, also 
briefly outlined in reference [1]): 


13023 —T 137 23 


13023 + k isde3 + k 


1 112.3 





0 Pr... 
k 








where a;;=+/1—r;7. The constant k determines the slope of the C and D 
scales, and was set equal to —1.25. 

Ward [2] developed a similar nomograph with equation identical to ours 
with the significant exception that k= —1. The result of this is that his nomo- 
graph may be used only when either ri; or r23 is greater than .55 in absolute 
value. His curves are given only at intervals of .05 units. 


REFERENCES 


[1] Lord, Frederic M., “Nomograph for computing multiple correlation coefficients, ” 
Journal of the American Statistical Association, 50 (1955) 1073-7. 

[2] Ward, D. H., “A nomogram for calculating partial correlation coefficients,” Incor- 
porated Statistician, IX (1959) 139-41. 








STEPWISE LEAST SQUARES: RESIDUAL ANALYSIS 
AND SPECIFICATION ERROR* 


Artuur 8. GOLDBERGER 
University of Wisconsin 


A stepwise procedure for estimating the parameters of the multiple 
linear regression model y = X,8:+X262+e is often used. Two articles 
in the March 1961 issue of this Journal focussed on the connection 
between the stepwise and the direct estimator of 82. The present paper 
spells out the connection between the stepwise and the direct estimator 
of 8;, as a special case of Theil’s analysis of specification error. A unified 
treatment of the two stepwise estimators is also provided. 


1. INTRODUCTION 


corr the multiple linear regression model y= X8+e, partitioned as 


y = X61 + XB. + «, (1.1) 
where 


y is the NX1 vector of observations on the regressand, 
X, is the NX K, matrix of observations on the first K, regressors: x; (¢=1, 
ee K,), 
8; is the K,X1 vector of coefficients of the first K, regressors, 
X, is the NX K, matrix of observations on the second K, regressors: 
(j=Kitl1,-+-, Kitz), 
8, is the K.X1 vector of coefficients of the second K, regressors, and 
e is the N X1 vector of disturbances. 


The classical estimators of 8; and 8; are obtained by direct least squares 
6 =(X'X)-'X’y, which may be partitioned as 


b, X{X; X{X:7° Xly 
lsl~Lxex, xexel Lxeu (a 
On the other hand, the stepwise least squares procedure first estimates (; 
by = (X1 X31) X1y (1.3) 
and then regresses the residuals 
§=y — Xib, = y — XXX) OX y = [IT — XXL XX! Jy, (1.4) 
upon X, to estimate 8; as 
by = (X/X2)X15 = (XJ X2)X3 [I — X(X1 XX |p. (1.5) 


This stepwise procedure has recently been discussed in two articles in the 
March 1961 Journal of the American Statistical Association.' In both cases the 


* This is a project of the Social Systems Research Institute. The author is indebted to Dr. Tong Hun Lee of 
the Institute for suggesting the present approach. 

1R. J. Freund, R. W. Vail, and C. W. Clunies-Ross, “Residual Analysis,” and A. 8S. Goldberger and D. B. 
Jochems, “Note on Stepwise Least Squares,” Journal of the American Statistical Association, 56 (1961), 98-104, 
105-10. 





998 





RESIDUAL ANALYSIS AND SPECIFICATION ERROR 999 


discussion was focussed on the connection between the direct and stepwise esti- 
mators of 82. This note is intended to fill the gap by discussion of the connection 
between the direct and stepwise estimators of 8. In doing so, it seems desirable 
to present a unified treatment by the use of partitioned matrix inversion. 

2. RESIDUAL ANALYSIS AND SPECIFICATION ERROR 


As is well known, the inverse of the non-singular symmetric matrix 


A B 
ly ol 
where A and C are square may be written as 
bs “4 = = + BF-'B’A-") re (2.2) 
BC —F-'B’ A F- 


where 
F = C — B’A='B (2.3) 


provided that A and F are non-singular.” This provision is met by the regressor 
moment matrix X’X so that the direct least squares estimators of (1.2) may be 
written as 


Hi s kee ee | —(X/X)7X/ Xo 


b. 
Xiy 
” a zy 
where 
F = XJ X_— Xi Xi Xi Xi) XI Xo. 
Now (2.4) reduces to 
rb, 
A 
ms sepeaeey — (X{X1)X{ X.FXs [I - epee (2.6) 
FX [I — Xi(X1X)7X1 Jy 
Then inserting the identity matrix FF-' and also the second line of (2.6) 
into (1.5) gives 
by = (XX) FFX3 [I — X1(X1X)-XY Jy = (X1 X.) Fb = 
be = [I — (X¢ Xs) X72 Xi(X1 Xi) 1X1 Xe] be (2.7) 


where the definition of F in (2.5) has been used. This is the basic result of the 
residual analysis aspect of stepwise least squares.* There will be a bias in the 





2 See Marvin Marcus, Basic Theorems in Matriz Theory. Washington: National Bureau of Standards, 1960, 
p. 17. 

* See Freund, Vail, and Clunies-Ross, op. cit., equation (4.5), p 101 (in which the first term on the right should 
be @ rather than C), and G-\dberger and Jochems, op. cit., equation (2.5), p. 106. 





000 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


stepwise estimator—Eb, + Eb,=8.—unless X, and X; are orthogonal, or unless 
6.=0. 

To turn to the stepwise estimator of 8;, insertion of the second line of (2.6) 
into the first gives 


by = (XX) Xfy — (XX) XI Xb, (2.8) 


which in view of the definition of the stepwise estimator in (1.3) gives after re- 
arrangement, 


b, = b, + (X{ X,)-1X{ Xn. (2.9) 


This we take to be *’ e basic result of the specification error aspect of stepwise 
least squares. It will be recognized that 


P = (X{X,) 7X1 X, (2.10) 


has an immediate interpretation as the matrix of coefficients in the “auxiliary 
regressions” of X; upon X,; that is, the jth column of P contains the coefficients 
of the multiple regression of x; upon 1, «+: , 2%, - + +, %x,. These auxiliary re- 
gressions are the central device used in the analysis of specification error de- 
veloped by Theil.‘ From this point of view, the first step of the stepwise pro- 
cedure mis-specifies the model y= X16,:+X.6.+e as y=X16:t+e, and hence 
mis-estimates 6; as 5; instead of as 6;. There will be a specification bias in this 
stepwise estimator—Eh, ~ £b,=8,—unless P=0 (that is, unless X/ X,=0), 
or unless E6,=8.=0. A verbal interpretation of (2.9) is that the stepwise esti- 
mator gives X, credit not only for its own influence upon y (represented by 
6; and unbiasedly estimated by 6,), but also for the influence of X: upon y 
(represented by 8: and unbiasedly estimated by 6.), to the extent that X, and 
X; are correlated. 

As for the direction of the bias, the previous articles showed that for the spe- 
cial case K,=1, the stepwise estimator 5, underestimated the absolute value of 
the coefficient.’ In the same special case we may indicate the direction of the 
bias of the stepwise estimator 6, as follows: The coefficient of 2; ({=1,---, 
K,) will be overstated by the stepwise estimator if the partial regression co- 
efficient of xx,4: on 2, (in the regression of xx,4: upon the set X,) has the same 
sign as the partial regression coefficient of y upon rx,4: (in the regression of y 
upon the full set X;, X2); it will be understated if the two signs differ. 





4 See H. Theil, “Specification Errors and the Estimation of Economic Relationships,” Review of the International 
Institute of Statistics, 25 (1957), 41-51, especially Theorem 1 on p. 43. 

5 See Freund, Vail, and Clunies-Ross, op. cit., equation (5.1), p. 102, and Goldberger and Jochems, op. cit., 
equation (2.6), p. 106. 





NOTES ABOUT AUTHORS 


FRANCIS JOHN ANSCOMBE, 43, is Professor of Mathematics, Princeton Uni- 
versity. Biographical notes appear on p. 737 of the September, 1961 issue in connection 
with Anscombe’s article, “Estimating a Mixed-Exponential Response Law.” 

EARL JENNINGS BELL, 31, became Research Assistant in the Operations Re- 
search Center, University of California, Berkeley on his return in 1950 from a year of 
graduate work at the London School of Economics. He had previously studied Political 
Science (B.A., 1952) and Engineering (B.S., 1957) at the University of California. From 
1957-59 he was employed as a Petroleum Engineer by the Creole Petroleum Corporation, 
Caracas, Venezuela. 

AGNES PAULA HOLLO BERGER has been Assistant Professor in the Division of 
Biostatistics of the School of Public Health and Administrative Medicine of the Faculty 
of Medicine of Columbia University, since 1957. After receiving her Ph.D. in Mathematics 
at the University of Budapest in 1939, she studied Statistics at Columbia under Neyman 
and Wald. Earlier papers by Mrs. Berger have appeared in the Proceedings of the American 
Mathematical Society and in the Annals of Mathematical Statistics. 

OTIS DUDLEY DUNCAN, 40, is Professor of Human Ecology at the University 
of Chicago. Since 1951 he has been on the staff of the University’s Population Research 
and Training Center and a member of the Sociology Department. Before going to Chi- 
cago, he taught at the University of Wisconsin and Pennsylvania State University. He 
has written several articles and co-authored several recent books in the areas of human 
ecology, social stratification and change, and social statistics. Duncan took his Ph.D. 
in Sociology at the University of Chicago in 1949 after previous study of Louisiana State 
University (B.A., 1941) and the University of Minnesota (M.A., 1942). 

MOHAMED-ABDEL RAHMAN EL-BADRY, 41, has been on leave from his 
position as Associate Professor of Statistics, Faculty of Economics, Cairo University 
since 1959 to serve in the Demographic Training and Research Center, Bombay, India. 
From 1948-52 he was a Statistician in the Ministry of Social Affairs, Egypt. El-Badry 
received his B.S.in Mathematics at Cairo University in 1945, and his DIC and Ph.D. in 
Statistics at the University of London in 1948. He worked on problems of sampling and 
demography as a research fellow at Princeton from 1953-55. He has written two statistics 
texts in Arabic and a number of articles and bulletins in the areas of demography, theo- 
retical statistics, and sampling. 

JOHN LEROY FOLKS, 32, received his Ph.D. in Statistics from Iowa State Uni- 
versity in 1958 after having majored in Mathematics at Oklahoma State University. 
He worked for Texas Instruments Incorporated from 1958 to January, 1961, first as 
Operations Research Engineer and then as Manager of Operations Research. He became 
Associate Professor of Mathematics at Oklahoma State University early this year. Folks’ 
articles on design of experiments, and quality control have appeared in Biometrika, Bio- 
metrics, aud Industrial Quality Control. 

JOHN E. FREUND, 40, served as Professor of Statistics at Virginia Polytechnic 
Institute and Professor of Mathematics at Alfred University before taking his present 
position, Professor of Mathematics at Arizona State University, in 1957. He received 
his Ph.D. in Mathematics at the University of Pittsburgh in 1952 after earlier training at 
the University of California, Los Angeles. Freund is the author of Modern Elementary 
Statistics (Prentice-Hall, 1952, 1960), and (with F. J. Williams) Modern Business Statistics 
(Prentice-Hall, 1958). A new book, Mathematical Statistics, will be published next year. 

ARTHUR STANLEY GOLDBERGER, 31, is Associate Professor of Economics, 
University of Wisconsin. Biographical notes appear on p. 160 of the present volume of 
JASA in connection with Goldberger’s joint article with Jochems, “Note on Stepwise 
Least Squares,” which appeared in the March, 1961 issue. 

LEO A. GOODMAN, 33, returned last fall to his position as Professor of Statistics 
and Sociology at the University of Chicago. He spent the 1959-60 academic year as a 


1001 











1002 AMERICAN STATISTICAL ASSOCIATION JOUKNAL, DECEMBER 1961 


Guggenheim and National Science Foundation Fellow at Cambridge University and the 
London School of Economics and Political Science. In 1960-61 he was Visiting Professor of 
Mathematical Statistics and Sociology at Columbia University. Goodman has published 
over forty articles in the following books and journals: American Sociological Review, 
American Journal of Sociology, Proceedings of the American Mathematical Society, Bulletin 
of Mathematical Biophysics, Annals of the Institute of Statistical Mathematics (Tokyo), 
Social Forces, Biometrics, Psychological Bulletin, Decision Processes (edited by R. M. 
Terall, C. H. Coombs, and R. L. Davis), Biometrika, Psychometrika, Sankhyd, Annals of 
Human Genetics (London), Modern Language Notes, Teoriya Veroyatnostei i ee Primene- 
niya (The Theory of Probability and its Applications) (Moscow), The Language of Social 
Research (edited by P. F. Lazarsfeld and M. Rosenberg). 

PHYLLIS A. GROLL, 25, is a graduate student in Statistics at Stanford University. 
Following graduation with a degree in Mathematics from Cedar Crest College in 1957, she 
served as Senior Technical Aide at Bell Telephone Laboratories until 1960. A previous 
article (with Milton Sobel), “Group Testing to Eliminate Efficiently All Defectives in a 
Binomial Sample,” appeared in the Bell System Technical Journal for September, 1959. 

SHANTI SWARUP GUPTA, 35, has been a member of the Technical Staff of Bell 
Telephone Laboratories since 1958. He also serves as Adjunct Associate Professor of 
Mathematics in the Institute of Mathematical Sciences, New York University. Cur- 
rently, on a leave of absence from the Bell Laboratories, he is visiting Associate Professor in 
the Department of Statistics at Stanford University for the academic year 1961-62. Gupta 
received his B.A. and M.A. degrees in Mathematics from the University of Delhi and his 
Ph.D. (1956) in Mathematical Statistics from the University of North Carolina. He has 
taught at Delhi College and the University of Alberta. He is a contributor to the recent 
volume in honor of Harold Hotelling (Contributions to Probability and Statitics, Stanford 
University Press, 1960), and his articles have appeared in the Annals of Mathematical 
Statistics, Biometrika, and Technometrics. 

WILLIAM M. HAENSZEL, 51, has been on the staff of the National Cancer 
Institute since 1952 and is currently Chief of the Biometry Branch. He has taught at 
Yale University and the University of Buffalo and has served in the New York and Con- 
necticut Departments of Health. Most of Haenszel’s publications have been concerned 
with the application of statistical methods to investigations in the fields of public health, 
vital statistics, and cancer epidemiology, and have appeared in the American Journal of 
Public Health, Public Health Reports, Journal of Chronic Diseases, and the Journal of the 
National Cancer Institute, 

ROBERT VINCENT HOGG, 37, wrote an article, “Certain Uncorrelated Statise 
tics,” for the June, 1960 issue of the Journal. His biographical note appears on p. 349 of that 
issue. 

NORMAN LLOYD JOHNSON, 44, has returned to his post as Reader in Statistics, 
University College, London after spending the 1960-61 academic year as Visiting Professor 
of Mathematics at Case Institute of Technology. Educated at the University of London 
(Ph.D. in Statistics, 1948), Johnson has taught at University College since 1938 except 
for a period (1939-45) as Experimental Officer with the British Ordnance Board and a 
year (1952-53) on the statistics staff at the University of North Carolina. His numerous 
articles have appeared in Annals of Mathematical Statistics, Biometrics, Biometrika, 
Journal of Institute of Actuaries, Journal of Institution of Electrical Engineers, Journal of 
the Royal Statistical Society, Metron, and Trakejor de Estadisticos. He is co-author (with 
H. Tetley) of Statistics: An Intermediate Text-Book, Two Volumes, Cambridge University 
Press, 1949-50, and author of Manual of Statistical Exercises (Part II. Analysis of Variance 
and Associated Techniques), London: University College, 1957. Johnson became a Fellow 
of the Institute of Actuaries in 1949. 

RUTH WINN LEES received her B.A. in Psychology from Northeastern University 
in 1953 and her Master of Education in Educational Measurement and Statistics from 
Harvard in 1956. Since December, 1960, she has been attached to the AEC Computing 
Center of New York University. 





NOTES ABOUT AUTHORS 1003 


FREDERIC MATHER LORD, 49, is Research Associate, Educational Testing 
Service, Princeton, New Jersey. Biographical notes appear on p. 481 of the June 1959 
JASA in connection with his article, “Problems in Mental Test Theory Arising from 
Errors of Measurement.” 

JOHN MANDEL, 47, Statistician in the National Bureau of Standards since 1947, 
studied Chemistry at the University of Brussels (B.S. in 1935, M.S. in 1937) before taking 
graduate work in Mathematics and Mathematical Statistics at Brooklyn College and at 
Columbia University. He worked as a Research Chemist, Société Belge de Recherches et 
d’Etudes, Ghent, Belgium from 1938-40 and spent several years as an industrial chemist 
in the United States before joining the Bureau of Standards. Besides JASA, Mandel has 
written articles for the Journal of Research of the National Bureau of Standards, Biometrics, 
Journal of Colloid Science, Analytical Chemistry, Annals of Mathematical Statistics, and 
Technometrics. 

ANIRUDH LAL NAGAR, 31, is Ford Foundation Visiting Professor in the Institute 
for Quantitative Research in Economics and Management, Purdue University. Prior to 
accepting this assignment last July, he served four years as Research Associate at the 
Econometric Institute of the Netherlands School of Economics. Nagar received his 
Ph.D. in 1959 from the Netherlands School of Economics after having studied Mathe- 
matics, Economics, and Statistics at the Lucknow University, India. He has authored 
articles on Simultaneous Equations Estimation which have appeared in Econometrica. 

LEE EGAN PRESTON, JR., 31, studied Economics as an undergraduate at Vander- 
bilt University and as a graduate student at Harvard. Since taking his Ph.D. at Harvard 
in 1958, he has been Assistant Professor of Business Administration at the University of 
California, Berkeley. He is the author of Exploration for Non-Ferrous Metals; An Eco- 
noinic Analysis (Washington: Resources for the Future, Inc., 1960). 

SHANTILAL MANGALDAS SHAH, 33, wrote an article, “A Note On Griffin’s 
Paper ‘Graphic Computation of Tau as a Coefficient of Disarray,’” which was published 
in the September, 1961 issue of the Journal. Biographical notes appeared on p. 739 of that 
issue. 

MONROE GILBERT SIRKEN, 40, has been Chief of the Actuarial and Surveys 
Branch of the National Office of Vital Statistics since 1956. After a year as Post Doctoral 
Fellow in Mathematical Statistics at the University of California, Berkeley, he was Mathe- 
matical Statistician in the Bureau of the Census from 1951-53 and in the National Office 
of Vital Statistics from 1953-56. Sirken received the Ph.D. in Sociology from the 
University of Washington in 1950 after completing his B.A. and M.A. in Sociology and 
Anthropology at the University of California, Los Angeles. Sirken is the author of several 
articles, bulletins, and contributions to books in applications of statistics to demography 
and epidemiology and in sample survey methods. 

H. O. STEKLER, 28, is Assistant Professor of Business Administration at the Uni- 
versity of California, Berkeley. He received his A.B. (1955) in Economics and Mathe- 
matics, Clark University and his Ph.D. (1959) in Economics at Massachusetts Institute 
of Technology. He joined the Berkeley staff in 1959. Stekler has served brief periods as a 
consultant to the Rand Corporation and to Melpar, Inc. 

KARL ERNST TAEUBER, 25, became Research Associate in the Population Re- 
search and Training Center and Assistant Professor in the Department of Sociology, Uni- 
versity of Chicago, last September after two years’ service in the Biometry Branch, 
National Cancer Institute. Taeuber received his B.A. in Sociology in 1955 from Yale and 
his M.A. and Ph.D. (1960) from Harvard in the same field. He is primarily interested in 
demography and human ecology. His previous articles are in these areas. 

HENRI THEIL, 37, is Professor of Econometrics and Director of the Econometric 
Institute, Netherlands School of Economics. Since receiving his Ph.D. from the University 
of Amsterdam in 1951, he has worked at the Central Planning Bureau, The Hague, and 
the Econometric Institute except for visits to The University of Chicago, Stanford Uni- 
versity and Harvard University in 1955-56, the summer of 1959 and the spring of 1960. 
Theil’s books, Linear Aggregation of Economic Relations, Amsterdam: North-Holland, 








1004 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


1954 and Economic Forecasts and Policy, Amsterdam: North-Holland, 1958, 1961, are well 
known to students of economic statistics. 

DEL LON WEST, 27, has served as Mathematical Statistician for Texas Instru- 
ments Incorporated since receiving his M.S. in Mathematical Statistics from Oklahoma 
State University in 1959. Earlier, he took his Bachelor’s in Mathematics at Southeastern 
State College, Durant, Oklahoma in 1957 and held a Graduate Assistantship in Mathe- 
matics at Oklahoma State from 1957-59. 








CORRIGENDA 


Freund, Rudolf J.. Vail, Richard W., and Clunies-Ross, C. W., ResipuaL 
Anatysis, Vol. 56, No. 293 (March, 1961), 98-104. 


James F. Hannan of Michigan State University and D. A. Lyon of the 
Suffield Experimental Station, Alberta, Canada have independently pointed 
out that the Appendix to the paper is unnecessary since the relation (4.7), p. 
101 can readily be established from simpler considerations. 

The inequality i B*>PfE is a consequence of the “least squares” deriva- 
tion of E. Relation (4.7) follows immediately from (3.9), p. 100 and the 
equality in (4.8), p. 101. 

In addition, the following minor misprints have been noted— 

(2.5) and (3.1), p. 99 should each read: “Y =XB+ZC.” 

(3.7), p. 100 should read: 92(7'7Z)-1.” 

The sentence which begins on line 15, p. 101 should start: “Further, if the 
X and Z variables are orthogonal,” a 

(4.8), p. 101 should read: E*’ E* = Y’ Y—Y*’Y* < EE. 

These corrections and the simpler derivation of (4.7) have been checked by 
the authors. 


Siegel, Sidney and Tukey, John W., A Nonparametric Sum oF Ranks 
PROCEDURE FOR RELATIVE SPREAD IN UNPAIRED SAMPLES, Vol. 55, No. 
291 (September, 1960), 429-45. 


Lloyd S. Nelson, Consulting Statistician of the General Electric Company, 
Nela Park, Cleveland, has observed that some of the critical values of R, given 
in Table 1, pp. 434-40, are smaller than necessary. The revised values tabu- 
lated below have been checked by one of the authors. 


From Table 1, Siegel and Tukey Revised 





Probability, % Prob., % 
+ /0 


Ri Actual 





Table Head Actual 





3.33 5.00 
3.18 5.00 
1.83 2. 
1.76 

0.43 

1.98 

1.97 











BOOK REVIEWS 


Demographic and Economic Change in Developed Countries. National Bureau of Eco- 
nomic Research. Princeton, New Jersey: Princeton University Press, 1960. Pp. x, 536. 
$12.00. 


ALEXANDER GERSCHENKRON, Harvard University 


HIs collection of conference papers augmented by discussants’ comments is pre- 
pre er in the customary style of a National Bureau volume. The authors are 
divided between demographers and economists. “The papers are of first-rate quality,” 
says A. J. Coale in his Introduction. Coming from the chairman of the planning com- 
mittee of the Conference and the contributor of an important paper, the statement 
has a curious ring. But it is quite correct. Most of the papers, taking each by itself, are 
very good. It is another question what degree of unity attaches to the volume as a 
whole. Conferences of this sort being what they are, modesty of expectation is in 
order. It is indeed fortunate that the area covered turns out to be narrower than 
originally intended. For the title is misleading and reaches out farther than the con- 
tents. This is not really a book on population problems in developed countries in gen- 
eral. The first—demographic—part of the volume contains two papers on countries 
other than the United States: “An International Survey of Recent Fertility Trends” 
by Halvor Gille and an essay on “Differential Fertility in Europe” by Gwendolyn Z. 
Johnson. Both are highly competent and informative pieces, but they cannot change 
the character of the volume. The remaining papers in the demographic part and the 
entire economic part—to the extent that they have an empirical locus—deal almost 
exclusively with the United States. Even so, the material presented is so variegated 
and at times so specialized as to defeat any attempt at a unified review. It must 
suffice therefore to advert to some of the papers and to discuss some of the views and 
conclusions offered. 

It is not surprising to find that in a symposium of this kind central attention is 
directed to the problem of rising fertility in the United States. It is in this area that 
the disconcerting changeability of the rates conspired with the imperfection of the 
tools to falsify the demographers’ predictions. One will appreciate, therefore, Norman 
Ryder’s contribution to this volume in which he describes his ingenious method for 
estimating from data on period fertility the complete fertility of cohorts which are 
still in the childbearing age. But improvement in the ability to predict is only part of 
the problem. Looking back at the completed records one cannot help being struck by 
the discrepancy between the wealth and the solidity of the empirical data and the 
elusiveness of the interpretive problems. Here no doubt lies one of the reasons for the 
weak internal coherence of the volume. The demographer’s craft yields tangible re- 
sults. But the economist’s art deals with factors that are highly uncertain as to their 
relative significance and direction. 

It is uncerstandably tempting, therefore, to try to escape from complex problems 
into a simple exercize in economies of choice. That is what Gary S. Becker has done in 
his contribution. Children are regarded as a durable good, primarily a consumers’ 
durable. The cost of children is computable essentially as present value of expected 
outlays (with some appropriate additions and deductions) ; at the same time, this cost 
determines the “quality” of children; separating quantity and “quality,” Becker 
arrives at the conclusion that quantity income elasticity for children is small but posi- 
tive, while “quality” income elasticity is large. Thus, comfortingly enough, the de- 
mand for children turns out to be very much like that for motor-cars. If the conclusion 
appears to contradict historical evidence, the reason, Becker assures, lies in the un- 


1006 





BOOK REVIEWS 1007 


even diffusion of knowledge regarding contraceptive practices. Once the latter is kept 
constant, a direct relation between income and the number of children appears. 

This is a suggestive little model. It would be attractive to extend it to related areas, 
such as matrimonial (and other) choices, a wife being clearly elected, or rather se- 
lected, as a durable consumers’ good (whereas a mistress might qualify as a semi- 
durable). Still, there are certain dangers in pushing the analogy between a child and 
an automobile too far. In the case of the child, the act of shopping has felicific aspects 
not fully duplicated in buying a car or a refrigerator. This is a fairly familiar fact 
which a theorist will ignore at his own peril in appraising the totality of the choices in- 
volved. Historically seen, many other relevant factors, in addition to contraceptive 
knowledge have varied across a stratified population and over time, drastic differen- 
tials in time horizons being one of them. As Duesenberry has pointed out in his Com- 
ment, the empirical evidence adduced by Becker is far from overwhelming, and the 
assumption of free rational choice among children of different “quality” is hardly 
realistic. Quite apart from social custom and social pressure, there is the problem of 
parental identification with the children. This, too, is a familiar fact, bound to con- 
fuse, if not erase, Becker’s demand and cost schedules. Outlays on, say, a child’s 
winter coat, unlike the cost of putting anti-freeze into a car, can be indistinguishable 
from a father’s expenditure on his own coat or hat. It would not be much more arti- 
ficial to let a man view himself as his own durable consumers’ good. 

Once we move away from the charming simplicity of Becker’s procreative eco- 
nomics, the picture becomes more blurred, but also more significant. One may single 
out A. J. Coale’s article on “Population Change and Demand, Prices, and the Level of 
Employment.” This is probably the central piece in the economic part of the volume. 
Coale examines the effect of changes in population upon the principal components of 
national income. He discusses one by one population growth in relation to the con- 
sumption function, investment other than housing, investment in housing, and, finally 
government expenditures. The author combiies theoretical considerations with the 
results of other writers’ correlation analyses and serves up an estimate of the hypo- 
thetical employment situation in 1940 under the assumption of the dependency bur- 
den and growth rate of 1957. The result is an increase in demand large enough to ab- 
sorb at least 60 per cent of the unemployment that prevailed in the earlier year; taking 
the effect of government expenditures into account, as much as 90 per cent of un- 
employment may have been wiped out. The contingency of causal effects proceeding 
in the opposite direction cannot be excluded, but Coale’s results agree so well with this 
reviewer’s preconceptions as to be quite safe from criticism. In his Introduction to the 
volume, Coale supplies what amounts to a broadened summary of his own article by 
saying (p. 14): 

“When viewed in aggregate terms, the effect of demographic variables on the economy 
form a paradox of sorts: the growth arising from high fertility increases aggregate 


demand, but reduces the full employment capability of the economy to increase 
output per head.” 


This statement does less than justice to Simon Kuznets’ contribution: “Population 
Change and Aggregate Output.” Kuznets starts from the general impression that at 
least in modern times secular rise of population was generally associated with a 
marked secular rise in per capita output, He does not assert an inevitable causal con- 
nection. Nor does he deny the likely presence of the oft-discussed unfavorable effects 
on output. He wishes simply to redress the balance of the debate by theorizing— 
“speculating” is his term—about the possible positive effects of population growth, 
having in mind, as almost everybody else in the volume, primarily growth caused by 
high fertility (and possibly immigration). The discussion proceeds under three heads 








1008 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


separating the role of the population as producers, savers, and consumers. The in- 
dividual factors adduced are not all equally significant or plausible. One may doubt, 
for instance, the supposition that in any population the share of geniuses and highly 
talented men is constant, so that growing population implies larger absolute numbers 
in the two categories. Once this is granted, however, it does not seem implausible to 
surmise, as the author does, that injection into the economy of men of genius and 
great talent yields increasing returns. In addition to stressing the advantages of big- 
ness in a couple of other areas, Kuznets discusses a number of cases in which growing 
population elicits output raising responses on the part of workers, savers, and con- 
sumers. A possible reduction of leisure is one of the suggested possibilities. Another is 
an interesting saving-increasing sequence via something that may be described as 
“keeping down with the Joneses” on the part of the earners of upper incomes. Milton 
Friedman, in his Comment persists to regard such effects as merely partial offsets to 
the unfavorable consequences of population growth. But Kuznets obviously thinks in 
terms of dynamic processes changing the basic “givens” of the economy, such as 
labor’s attitude to work or consumers’ receptiveness to new products. It is a moot 
question whether, and if so to what extent, such effects actually occur. But they can- 
not be refuted or their magnitudes limited by logical reasoning as Friedman has tried 
to do. “Speculations” offered by Kuznets should be regarded as a stimulating and 
challenging program for empirical research. 

In the light of Kuznets’ essay, Coale may well have surrounded his paradox with 
some reservations and qualifications. On the other hand, Coale’s own expectation of a 
positive influence of population growth on effective demand finds some further moder- 
ate support in the essay by Jean A. Crockett in which she examines the effects of a 
rising population on demand for food; and a rather strong corroboration in Robert 
Ferber’s study on “Population and Demand for Services.” Yet it is Ferber who, while 
stressing the role of population in stimulating consumers’ expenditures warns against 
assuming an automatic nexus between population and consumption. It seems that 
anything might happen. 

This feeling of uncertainty expressed at the end of the last paper of the volume is a 
fitting conclusion to the 500 odd pages of the symposium. Despite all the efforts, eco- 
nomics of demography remains an elusive business and the accumulated stock of re- 
liable empirical knowledge appears distressingly small. When it comes to the final 
count, the most significant things said on the subject of “the great fertility reversal 
and the sustained rise since” are contained in the few comments Coale makes in his 
Introduction. And they are very much in the nature of “speculation.” But perhaps 
impatience is out of place. It is possible that we stand at the beginning of a new era. It 
is possible that the future lies with formidable, multidirectional stochastic models 
which operate with the immediate decision units and the solution of which is obtained 
by simulation on large electronic computing machines. A fascinating preliminary view 
of such a model was presented to the Conference by Guy H. Orcutt and Alice M. 
Rivlin in their piece on the Household Sector. But the road is long and arduous. And 
it is not absolutely clear that moving along it will bring us closer to testing some of the 
more alluring hypotheses with which this volume is sprinkled. 


Theory of Industrialism—Causal Analysis and Economic Plans. Johan Akerman. Lund, 
Sweden: C. W. K. Gleerup, 1960. Pp. 332. 25 Swedish Kroner. 


AMmARTYA Kumar SEN, Massachusetts Institute of Technology 


y bee is a book with a mission. “New economic theories are as a rule assumed to be 
‘complements’ to a given tradition. . . . In this treatise exception is taken to this 
rule.” (p. 7) The author contrasts “calculation models” with “causal analysis,” and 





BOOK REVIEWS 1009 


maintains that while tradition favours the former, the latter is what he is after. In ex- 
plaining the difference, Professor Akerman points out that “the models of calculation 
describe logically and without reference to a real time-scale the pattern of thought of 
consumers, producers, and the State; the pattern as a rule being expressed by defini- 
tions and inter-relations of standardized concepts.” (p. 273) Professor Akerman wishes 
to replace this kind of “ever-valid” theorization by “causal analysis,” which “explains 
the actual, total processes by a scrutiny of the motives, perspectives, and actions of 
the essential groups and the result of these driving forces as they appear in the varia- 
tions of time-series.” (p. 273) In preferring this goal, one suspects, Professor Akerman 
is not as altogether alone as he seems to think, and many economists and most eco- 
nometricians will not feel any compulsion to disagree with him. Nor can it be said 
that the author’s philosophical tables and highly impressionistic diagrams make the 
distinction between the two approaches much clearer than it already is in the stand- 
ard literature. Judged as a methodological contribution the book appears to this re- 
viewer to be quite banal. 

But luckily this is not all. The earlier part of the book, before Professor Akerman 
goes into the philosophical foundations of the nature of good economics, consists of a 
series of interesting empirical studies of the process of industrialization, taken mainly 
from England, Germany, France, and the United States. There are some very illumi- 
nating discussions of what he calls the “driving forces of industrialization,” ranging 
from “technological progress” and “increase in population” to things like “political 
changes” and “organization of labour.” 

Also of interest is Professor Akerman’s study of business fluctuations. He dis- 
misses “the Kondratieffs” as “artificial products” (p. 116), and “the so-called Kitchin 
cycle or purely economic short cycle can be relegated to the store of unfounded con- 
structions” (p. 169). Of the three, the Juglars or the usual trade cycles occupy most of 
his attention. While he displays an inability to understand the nature of Ragnar 
Frisch’s arguments about the periodicity problem and dismisses them as “quite in- 
conceivable” “from a psychological point of view” (p. 168), Professor Akerman pro- 
vides quite an ingenious defence of the old Jevons type of seasonal fluctuations. He 
puts forward the point of view that “there are certain constant and periodic elements 
such as the season—the basic one-year cycle—and certain institutional periodicities— 
such as the American presidential period—and regard the cycles as periodic move- 
ments influenced by these mechanically working devices, but also moving due to their 
own economic attributes such as the time required for capital-construction, the legal 
requirements of the monetary system, and essential (though still impalpable) psycho- 
logical time-factors.” (p. 174) Professor Akerman does a reasonably good job of ex- 
plaining the different types of fluctuations in Europe and America in these terms; the 
study of American four-year cycles connected with presidential elections is very 
cleverly done. This is, of course, not the same as conceding that Professor Akerman’s 
main thesis carries conviction. While the year to year fluctuations and the four-year 
political cycles are quite convincingly discussed in terms of the periodicity of ex- 
ternal factors, the section on the Juglar or common cycle is a shocking example of 
casual empiricism. Here statistical analysis is very largely replaced by verbal con- 
jectures. This seems to be the weakest link in Professor Akerman’s thesis of external 
causation, and it happens to be an important link. 

This is followed by a reasonably interesting section on the nature of structural 
change in these economies, and by a very poor section on “the principles of economic 
planning,” which does not go much beyond the rather obvious points. And finally 
there is the section on the philosophy of economics that I referred to earlier. This is not 
the most useful or interesting section ef this book, and that happens to be an under- 
statement. 











1010 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


Productivity and Technical Change. W. FE. G. Salter. New York: Cambridge University 
Press, 1960. Pp. ix, 198. $4.50. 


GLENN L. Jounson, Michigan State University 


H1s work is divided into two major parts, one conceptual and theoretical, the other 
pipet sot As would be expected, both the strengths and weaknesses of the em- 
pirical portion originate in the conceptual portion. 

The basic problem of the book is to analyze “the relationship between productivity 
and technical change.” (p. 1) It is accepted that “only one type of productivity con- 
cept is measurable. This is the concept of output per unit of input.” (p. 2) The prob- 
lems of measurement and interpretation are discussed and many inevitable short- 
comings are recognized. 

The author states that he is “deeply conscious of a gap between the empirical 
(Part II) and theoretical (Part I) approaches. This is largely unavoidable, both bé- 
cause of the incomplete nature of the theory, and the impossibility of obtaining data 
for variables which the theory suggests are significant, such as best practice tech- 
niques and costs.” (p. 9) Vernon Ruttan has given this book a conventional kind of 
review in the Journal of Farm Economics (Vol. 43, February, 1961, p. 160f.) and has 
recognized its excellence. | concur as to the general excellence of the book but, de- 
spite this, will hazard some suggestions as to how the gap referred to above between 
its theory and empirical work might be partially closed in future work. 

Closing this gap might eliminate some of the unexplained variance responsible for 
the inclusive results summarized in the following quotations from the ends of the em- 
pirical chapters. On p. 113, the summary of the inter-industry survey states, “The 
interpretative conclusions derived from the analysis should be regarded as highly 
tentative,” “ ... the figures reveal a very great diversity in the experiences of differ- 
ent industries.” On pp. 145 and 146, “The results are consistent” with the hypotheses 
“uneven rates of technical advance between industries when these advances are of the 
type which tend to save labour, capital and materials.” “The results may also be par- 
tially explained by the uneven impact of economies of scale.” “Because of the lack of 
data ..., it is impossible to integrate the explanation with the analysis (theoretical) 
of Chapters IV to VII.” Conclusions about the relationship of structural change to 
productivity were more definite. So were the conclusions about the distribution of 
gains from increased productivity. Chapters VIII through XII were summarized in 
the beginning of Chapter XIII as follows: “A number of important conclusions have 
emerged from the three preceding chapters: technical change and allied economies of 
scale are the causes of increased productivity more consistent with the behavior of 
costs and prices; the gains of increased productivity have been distributed by means 
of prices; and structural changes not only play an important part in increasing pro- 
ductivity but inevitably surround the whole concept with index-number ambiguities.” 
(p. 163) An extension is made in Chapter XIII of the analyses to the American 
scene. The summary of this chapter states, “The data reveal a very great diversity in 
movements of output and output per man hour, as in the United Kingdom.” *... 
the evidence for .. . an association (between output per man hour and hourly earn- 
ings) must be regarded as inconclusive.” 

Closing the gap and clearing up of inconsistencies might be accomplished by a more 
explicit handling of investments in the durables associated with technological ad- 
vances. The present analysis has concentrated on the acquisition prices, salvage 
values (for the industry) and life spans of the durables. With perfect knowledge of 
everything but technological advance, the deviations between “best practice” and 
“actual practice” and, hence, between best practice and actual practice productivity 

































1011 





BOOK REVIEWS 





are accounted for, in considerable part, by (1) rate of technological advance and (2) 
the life spans of investments in previous technology. Economies to scale (if not previ- 
ously utilized) would permit adoption of new technologies before durable investments 
in old technologies are consumed or reduced below salvage values. While the author 
does not appear specifically to assume perfect knowledge of non-technological change, 
the analysis proceeds as if such an assumption had been made. 

Actually, the “real worlds” of both English and American industrialists are char- 
acterized by change and imperfect knowledge of many kinds. Wars change both in- 
puts and product prices. So do shifts in foreign supplies and demands. Inflations and 
deflations occur, making it difficult for industrialists to distinguish earnings from 
capital gains and losses. Institutional changes, too, such as social security and tax 
shifts have their unforeseen impacts. The point is that such unforeseen changes lead 
industrialists to overinvest in technologies in much the same way as unforeseen 
technological changes. Each such overinvestment explains some of the divergence be- 
tween “best” and “actual” practice. 

It is strongly suspected by this reviewer that an important amount of the unex- 
plained variation discovered in the study between “best” and “actual” practice and, 
hence, between resultant productivities could be accounted for by theoretical and em- 
pirical work which: 

(1) would take into explicit account these other kinds of imperfect knowledge, and, 

(2) related investments to expectations concerning these kinds of knowledge to 

(3) determine variations in the stock of durable investments in outdated technol- 

ogies. 


The above suggestions are intended as constructive with respect to an already 
competent commendable work of credit to both the author and his profession. 


Statistical Geography: Problems in Analyzing Areal Data. Otis Dudley Duncan, Ray P. 
Cuzzort, and Beverly Duncan. Glencoe, Illinois: The Free Press, 1961. Pp. vii, 191. $6.00. 


























Lesuig Kis, University of Michigan 


Hy do we find the presentation of data in map form so fascinating and reward- 
Wie Essentially, placing a set of data on a map is but one special manner of 
tabulating and presenting the data to the reader; that is, the data are first sorted into 
areal units and then these areal data are presented on maps instead of printed rows 
and columns. Why does he find reading it in this form so much more rewarding than 
reading an ordinary rectangular printed table? Because he brings to the reading of 
mapped data a wealth of information, derived from many other maps of the same 
area, containing rich stores of other data. He relates the new set of data to the great 
complex storage of other data for the same areas that he carries about in his head. 
Some of the data in that storage refer to the narrowly geographical topics of top- 
ography and the natural resources in materials and energy; others deal with historical 
topics, including the distribution and development of population densities and the na- 
ture of political boundaries; still others relate to cultural traits involving languages 
and literature. The store of knowledge each of us carries is vast and complicated; it is 
also individual and subjective. Hence, our interpretation of mapped data is an art and 
its content is difficult to communicate, to evaluate, to check and verify. 

Desire for objective comparison and communication leads naturally to attempts to 
summarize areal data with a small set of stable statistics of broad applicability. A 
fundamental aspect of areal data is that they are based on units composed of groups of 
population elements. Usually these units are arranged in a hierarchy of types that 
constitute successive partitions of the population; e.g., continents are divided into 





1012 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


states, states into provinces or regions, then these into counties and smaller units. 
Official statistics are often presented in terms of these units. Statistical techniques 
exist for dealing with units which are aggregates of population elements and these 
techniques can be applied fruitfully to geographical units also. A serious shortcoming 
of established statistical techniques is that they deal easily only with discrete and un- 
related units; whereas areal units are distributed in space not randomly, but are re- 
lated to each other in many complex ways. This shortcoming leads naturally to 
searches for special techniques to deal with the special problems of areal units. 

Perhaps we can appreciate this problem better if we first look at the similar but 
simpler problem of dealing with time series. A large-and rich portion of statistics has 
been created just to deal with a set of data forming a sequence in which the elements 
are related to each other along a single dimension—and usually only in one direction 
in that dimension (if we make an exception for the possible reverse effects of psycho- 
logical expectations and anticipations). The statistics of time series (and the mathe- 
matics of stochastic processes) can illustrate the difficulties introduced into a set of 
data by correlations in a single dimension. How much more difficult it is to construct a 
mathematical model to represent adequately geographical relationships in two dimen- 
sions! (For simplicity sake the “third” vertical dimension can be projected—as it is 
done on maps—unto the two dimensions as a variable; also let us disregard the third 
dimension that could be introduced with a dynamic view in time depth.) All kinds of 
complex relationships and variations exist, with discontinuities caused by the inter- 
ruptions of shorelines, mountains, political boundaries and at other kinds of artificial 
and natural barriers. The serious mathematical complexities may help to explain why 
statistical geography has been—and probably will long be—neglected by theoretical 
statisticians and so to explain the resultant lack of systematic statistical treatment. 

It is not surprising then that the authors decided to define statistical geography as 
what the statistical geographers have done. They also circumscribed it explicitly in 
three respects: they omit both cartographic techniques and expositions of the prin- 
ciples of statistical methods, and they further concentrate on data pertinent to demo- 
graphic, sociological and economic studies. The book is a “by-product of several years 
of research on areal problems... upon the completion of our contribution to that 
project we seemed to have accumulated a collection of methodological conundrums 
which were worth sharing with other research workers . . . this circumstance occa- 
sioned writing of the present volume. ... ” They made a careful search of the litera- 
ture (in English, at least) on statistical, economic, sociological and geographical 
shelves. If the results fill only 173 small pages of text, it is because there is not much 
more to be found—as the authors imply frankly, but carefully to avoid hurting any 
feelings. 

The techniques described include: Kitagawa’s analysis of change into a component 
for changes in areal distribution plus a component for change in unit means; the 
Robinson and Goodman discussions of “ecological correlation” (i.e., correlations of 
unit means); indexes of concentration and redistribution; some special uses of regres- 
sion; Geary’s “contiguity ratio” for measuring the homogeneity of contiguous units. 
The first two chapters include a discussion of perspectives and an analysis of the na- 
ture and the uses of areal units. 

There will be others in the future who will want to apply statistical methods to geo- 
graphic data; they will find this book useful in showing them readily what techniques 
have been used by others. It is more convenient than a list of references that one 
would have to chase through libraries. It is more useful than a book of readings, be- 
cause the authors have done a great deal of the work for us in reducing the previous 
contributions to briefer and simpler presentations. Furthermore, the authors made a 











BOOK REVIEWS 1013 


serious attempt to place these individual contributions within a theoretical frame- 
work. Those who will want to do statistical work with geographic data will be grateful 
to the authors for searching the literature and for making it more comprehensible, 
more available and easier to read. 


Prediction and Optimal Decision. C. West Churchman. New York: Prentice Hall, Inc., 
1961. Pp. xiv, 394. $9.00 trade, $6.75 text. 


Nicuotas M. Smitu, Research Analysis Corporation* 


‘T BECOMES more and more apparent that the so-called ‘methods’ of physical 
sciences are not appropriate for measuring the critical aspects of human be- 
havior....” Is there no adequate science of measurement or theory of scientific 
judgment? This is a recurrent query in Churchman’s latest publication. 

His mission in this book is to present an exposition of the fundamental problem of 
the science of values. The orientation is empirical—he conciudes that a science of 
ethics is feasible but possible only if there is a science that can measure values. The 
book is concerned therefore with a study of the problems of measurement, measure- 
ment verification, and application of these ideas to the measurement of values. Any 
claim for an intention to construct a precise, conceptual model for a methodology for 
the scientific verification of ethical judgments is disavowed; instead the author says, 
“I am more concerned with pointing out the problems we face than with providing 
final answers.” 

An understanding of the book will be made considerably easier if the reader at- 
tempts to familiarize himself at the beginning with the metaphysical viewpoint 
adopted by the author. This viewpoint is not spelled out explicitly until late in the 
volume in the chapter on “Decision Methods of Science.” Here the question of the 
judgment process in the verification of scientific findings is discussed. The methods of 
scientific decision are classified according to a position taken with respect to three 
basic dichotomies: conventional vs non-conventional (i.e., arbitrary (hypothetical) or 
true of nature), formal vs non-formal (i.e., following agreed-upon rules of inference or 
not), and deductive vs inductive (i.e., commitments inferred as the consequents of or as 
the antecedents to primal judgments). 

Churchman classifies his own viewpoint as primarily non-conventional, non-formal, 
and deductive. He explains, “the interest of the members of this group is in ‘true’ 
axioms and true conclusions. Furthermore, no stress is laid on precise methods of de- 
duction and the construction of artificial languages. Instead, the group borrows terms 
from common languages, and the meanings of the terms are supposedly clear to the 
group members. The group may also coin some terms or construct symbols which are 
defined within the original, supposedly clear language.” 

Churchman’s metaphysics direct him, in seeking the truth in nature, toward em- 
piricism; although he maintains that empiricism is not the only source of truth. 
Values are facts—whatever their source—that are subject to scientific discovery and 
verification like any other facts. Thus his chief concern is with measurement and veri- 
fication through prediction. In behavioral language one says “decisions imply values.” 
Decisions constitute observables from which values are deduced as consequents. A 
theory that can predict is adequate, he maintains, because having thus verified the 
true values, the prescriptive side of decision (values imply decisions) is therefore only 
methodological. Although he admits the formalists have a mission in the development 
of theory, he views withsuspicion and distrust the semantic connotation that par- 
ticular hypothetical (arbitrary) models are in fact descriptions of nature simply be- 
cause of labels such as “inventory theory,” “waiting line theory,” etc. 


* Formerly Operations Research Offce. 












1014 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


Churchman’s non-formal approach is a source of great difficulty to the reader, who 
must bear the greater portion of the burden of communication. He must make a con- 
centrated effort to pay less attention to what the author says in order to interpret 
what the author means. He must also wrestle with unusual usages of common terms 
“borrowed” from every day or from technical language. This practice does much to 
distract the reader from the significant message of the work. 

The central problem to which the author devotes his attention is the problem of 
measurement—of acquiring factual information. The determination of a fact, he 
points out, is fundamentally a decision process. One must decide what observations 
to make. Furthermore, since it is impossible to completely control all of the environ- 
mental factors of observations, one must decide on a standard of measurement and 
upon a means of correcting off-standard observations to standard conditions. All of 
these processes entail a theory. The experimentalist then is unavoidably entangled in 
a theory of the phenomena he seeks to measure. “In other words,” he says, “theories 
of science are not tested by a set of data arrived at independently of the theory itself.” 

Since decisions—the raw material of value determination—involve both values and 
probabilities, Churchman examines the measurement of probability as a decision 
process. From this viewpoint he is led to try two definitions of probability of an event. 
First he considers probability as a relative frequency—or more precisely, “the prob- 
ability that an item belongs to a subclass is the ratio of the number of members of the 
subclass to the total members of the reference class.” This definition he criticizes 
since decisions must be made as to the characteristics of the reference class and as to 
whether a particular occurrence is or is not a member of the given subclass. This and 
other difficulties relative to small samples lead to a second definition: “The probability 
of an event is the degree to which it is well substantiated in the beliefs of ‘experts.’ ” 

He then turns to the questions of setting up standards for value measurement and 
immediately encounters an inconsistency. Since both probabilities and values are in- 
volved in decision procedures, it is necessary to reduce the uncertainty in measure- 
ment of values by making certain that the subject is aware of the “true” probabilities 
with which his actions lead to outcomes. 

Churchman is forced into the use of true probabilities by the nature of the von 
Neumann axioms of utility—which are all expressed in terms of known (hence true) 
probabilities; as are decision theories based on stochastic definite models. He con- 
cludes, rightly I believe, that present value measurement prescriptives are not ade- 
quate for their task. Churchman evades this deficiency by declaring that the role of 
science is to provide good measurements of probabilities. The value-scientist would 
then be able to make good estimates of his subject’s values and to verify these esti- 
mates by predicting his subject’s behavior. 

From Churchman’s viewpoint prediction (the ability to predict actions) connotes 
“ought.” The ability to predict “verifies” the true nature of a subject’s values. Thus if 
the subject is rational, Churchman argues, he ought to behave consistently with the 
true values pointed out to him. 

The logical point that Churchman misses in asserting that it is possible to know a 
subject’s “true” values is the same one that Newton missed in his presentation of 
“the” laws of motion in Principia Mathematica (1687). The long history of successful 
application of the Newtonian theory of mechanics, together with its determinism, 
has frequently blinded men to the logical fallacy in the contention that “true” axioms 
can be “deduced” from nature. The point is that no test for scientific “validity” within 
an object system can do any more than demonstrate for the moment the sufficiency 
of scientific theory. Necessity (uniqueness) cannot be demonstrated; nor can future 














BOOK REVIEWS 1015 


sufficiency. Thus one has no way to demonstrate that a deduced principle is true of 
nature; certainty is never demonstrable. 

Although one readily admits the primary points of Churchman’s thesis—that a 
measurement entails a theory and that present theories are inadequate for value meas- 
urement—the metaphysical basis of his work is inadmissible. The consequences of 
the logical point made above lead to an epistemology very different from that of 
Churchman’s. Since it is inappropriate to elaborate here an alternative viewpoint, this 
reviewer is offering a commentary for publication elsewhere. 

With respect to the remainder of the book, one may observe that Churchman, in 
rigorously restricting himself to a value-science (i.e., measurement of other peoples’ 
values by observation of their decisions—an “extensive” inquiry) has shut himself 
away from a wide and fertile field of investigation: that of inquiry into one’s own 
values for decision purposes (“intensive” inquiry). He is, instead, led into a radically 
positivistic attitude, which results in the omission of all prescriptive inquiry. He is led 
to express a fear, which I do not share, that the value-scientist inherently faces an un- 
appealing disjunction: either he may be placed in the dangerous position of knowing 
his client’s values better than the client knows them, or conversely, he may “unearth 
a tragic truth about man: mankind has no morality.” 


Stochastic Population Models in Ecology and Epidemiology. M. S. Bartlett. New York: 
John Wiley and Sons, Inc., 1960. Pp. x, 84. $2.00. 


Norman T. J. Battery, University of Oxford 


_ book is a very welcome member of the new series of Methuen Monographs on 
applied probability and statistics. It gives an extremely useful introduction to the 
application of stochastic process theory to a selected range of topics in the study of 
biological populations. The exposition is lucid, though necessarily compressed, and a 
first reading would probably not be easy for anyone having no prior acquaintance 
with the hardling of stochastic processes. 

Chapter J] contains a number of observations on the use of statistics in biologig¢al 
studies. Emphasis is placed on the role played by mathematical models of the phe- 
nomena involved, and on the importance of probability formulations leading to 
stochastic, rather than deterministic, developments. 

The second chapter deals with various frequency distributions which may arise for 
single species, such as the Poisson, negative binomial or modified logarithmic dis- 
tributions. In the next chapter there is a discussion of the basic theory of birth-and- 
death processes, with an indication of how extensions can be made to situations in- 
volving two kinds of individuals or age-dependent birth- and-death rates. Chapter 4 
provides a more detailed account of the problems of growth and interaction. There is, 
in particular, a useful discussion of the setting up of artificial realizations of a process 
and the use of Monte Carlo methods of investigation where the theoretical model is 
mathematically intractable. Chapter 5 is devoted to the problem of competition be- 
tween two species. 

The remainder of the book is confined to the theory of epidemics. Chapter 6 dis- 
cusses the basic models; the next chapter gives a good account of recurrent epidemics; 
and the final chapter introduces the reader to the exceptionally difficult problems 
arising when a spatial distribution of cases is considered. The short bibliography gives 
the chief references required for a more detailed study of the considerable literature 
that exists. 

The mathematical complexities introduced in this book may cause biologists to en- 
quire what purpose is served by going beyond the older deterministic treatments of 














1016 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


population phenomena, especially as a rather drastic simplification is usually re- 
quired if a stochastic formulation is to be mathematically tractable. The answer to 
this vital question is that we use the simplest model giving a sufficiently accurate ac- 
count of the salient features of the real process. When numbers are large, it may well 
be that a deterministic model does provide a sufficiently realistic picture: probability 
fluctuations may be relatively small and can be neglected with impunity. 

If, on the other hand, numbers are small enough for probability fluctuations to be 
appreciable, then the properties of a stochastic model may well be very different from 
those of the corresponding deterministic model. For example, the deterministic theory 
of recurrent epidemics involves a damped train of oscillations in the number of cases, 
which is contradicted by observation. The stochastic theory, however, is much more 
realistic and entails an wundamped succession of outbreaks, the interval between out- 
breaks varying in accordance with a certain frequency distribution. 

It has also been objected that many stochastic formulations, while having the ad- 
vantage of including the probability aspects of reality, may sacrifice other important 
characteristics for the sake of tractability. Thus in studying the population growth of 
a species of insect, ecologically important factors such as lack of food, cannibalism, 
over-crowding, etc. may, at least initially, have to be ignored. It is sensible, however, 
to study the simplest models first, and then to introduce additional complications 
later. Thus the stochastic population theory of epidemics discussed in this book by 
Prof. Bartlett shows that this approach affords an insight into epidemiological phe- 
nomena that is unachievable by deterministic means, in spite of a certain degree of 
oversimplification in some of the assumptions. 

In short, this is an excellent book, either for readers requiring an introduction to the 
application of stochastic models to certain population phenomena, or for those who 
wish to revise their knowledge of the specialized topics discussed. 


The Study of Population: An Inventory and Appraisal. Philip M. Hauser and Otis Dudley 
Duncan, Editors. Chicago, Illinois: University of Chicago Press, 1959. Pp. xvi, 864. $15.00. 


Grorce F. Marr, Smith College 


HE origin of this volume was a decision by the National Science Foundation that a 
pines step in its work would be the assembly of a body of facts about the cur- 
rent status of science in the United States. To accomplish this task in the field of 
demography, the Foundation recruited editors for the volume and provided financial 
support for the work, including remuneration for its more than two dozen authors. 
The result is a formidable compendium of over 800 two-column pages. 

The body of the book, consisting of separate chapters by leading workers in de- 
mography and related fields, is organized under three headings. A section entitled 
“Development and Current Status of Demography” begins with a substantial history 
of the field by Lorimer and then offers eight chapters on the state of affairs in selected 
individual countries or areas. “Elements of Demography,” the largest section, pre- 
sents a discussion of world demographic data and individual presentations of pro- 
cedures and findings of main areas of demographic research: fertility, mortality, work- 
ing force, and the like. The final section, “Population Studies in Various Disciplines,” 
considers relationships between demography and seven associated fields of study. The 
twenty-eight chapters comprising these three sections are set in a generous frame of 
introductory material prepared by the editors. Each section has a brief introduction of 
its own. The three sections are preceded by a fourth, Part I, “Demography as a 
Science,” which devotes four chapters to a brave attempt to summarize the materials 
of the contributing authors and to setting forth the editors’ own views of demography 





BOOK REViEWS 1017 


as a science and a profession. Part I, in turn, is condensed into a single chapter of 
“Overview and Conclusions” with which the book begins. This arrangement is some- 
what redundant, but it does allow the reader a choice as to the intensity with which he 
wishes to pursue the appraisal of demography. 

A somewhat unusual feature of the editing is the degree to which the editors 
make explicit the fact that some of the authors did not precisely adhere to their in- 
structions and thus rendered the book rather less consistent and more repetitious 
than had been intended. [At one point (p. 41), the editors seem almost to be writing a 
review of their own volume.] For the purpose for which the book was written, a de- 
scription of the condition of demography, this is probably a good feature, since it 
helps avoid an artificial appearance of agreement among individuals whose opinions 
differ. 

Although the editors early state that they “need not begin by raising the question of 
whether demography is a science” and will instead “indicate . . . what kind of science 
it is” (p. 29), the reader fairly often gets the impression that both the editors and at 
least some of the authors feel it their duty to make clear to possible doubters that 
demography is indeed a science: it may suffer from the disadvantage that it must ob- 
serve rather than experiment, but so do other established sciences; it may not be the 
most advanced science, but neither is it the least; demographers almost universally 
recognize the difference between scientific activity and social engineering. In short, 
“demography is a science because it embodies all the essential elements of scientific 
outlook and method” (p. 20). Whether demography is a science is an important ques- 
tion for the National Science Foundation—and the reviewer joins the editors and au- 
thors in the hope that the case has been sufficiently well made. For what other readers 
and in what other contexts will the book have value? Unfortunately, it seems unlikely 
that large numbers will find it possible to digest the whole document. 

A complete novice, with little social science or demographic training, will find much 
of the bock too advanced. For graduate students learning the field, on the other hand, 
the chapters on the elements of demography will be useful, particularly in the next 
few years, before they become out-of-date. Professional demographers will find ma- 
terials of interest in the same chapters, especially for topics about which they are less 
familiar. For those willing to undertake the reading, the chapters on relationships 
with other disciplines may be helpful as a reminder of some of the ramifications of 
what they are doing. It must be admitted, however, that the presentation of a few of 
the chapters will deter some. For example, few but specialists in the area will take the 
time to master the symbolic language of Spengler’s “Economics and Demography,” 
though such as do will find a broad framework for organizing demographic questions. 

Statisticians will find matter of various sorts to interest them. The history and re- 
gional status of demography are, in part, elements of the history and status of sta- 
tistics. Methods of measurement and some empirical findings are presented in numer- 
ous chapters. There is a significant observation (p. 12) that no major university 
statistics center has developed a program in demography. 

On balance, the editors are to be congratulated for assembling in a relatively short 
time so fine a selection of papers. The volume suggests that demography is a field 
whose central focus is clear but whose borders are vague; that demography has done 
much careful work and operates in a scientific frame of reference; that by the criterion 
of the ability to explain and to predict demography has not yet come too far. These 
are not startling propositions, but in the course of offering documentation for them 
the book collects into a single place a great deal of useful information, not the least 
part of which are the extensive bibliographies that accompany many of its chapters. 








1018 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


Proceedings of the Conference on Consumption and Saving, Volumes I and II. Jrwin 
Friend and Robert Jones, Editors. Philadelphia: Wharton School of Finance and Com- 
merce, University of Pennsylvania, 1960. Vol. I. Pp. xxiv, 480. Vol. II, Pp. ii, 498. No 
price listed. 


E. Scorr Maynes, University of Minnesota 


HE list of contributors to these two volumes reads like a “Who’s Who in Consump- 

tion Economics.” Thus it is not surprising—and this differentiates it from any re- 
cent work—that its contents provides a fair!y accurate picture of the current state of 
consumption economics: the range and quality of its data, the behavior it seeks to ex- 
plain, the nature of its contending theories, its data-collection and statistical meth- 
odologies. To all it will be clear that consumption economics is in an unhappy state of 
flux, suffering from a surfeit of partially satisfactory data only partially digested, The 
sample surveys of the early 1950’s, though they spewed forth titillating data, have 
proved less than wholly satisfactory. The permanent income hypothesis, though pres- 
ently dominant, has not conquered all and many in the consumption fraternity seek 
and await a new synthesis. 

Most of the papers in these volumes are the outcome of a deliberate, systematic 
effort to “mine” for scientific purposes the data of a single sample survey: the 1950 
BLS Survey of Consumer Expenditures. This survey, conducted for the immediate 
purpose of revising the weights in the Consumers Price Index, collected detailed 
quantitative information for all categories of income, consumption and saving from 
12,500 households, a cross-section of urban United States. Much of the credit for the 
systematized exploitation of the 1950 BLS Survey should go to Irwin Friend, re- 
search entrepreneur extraordinary, co-author of several papers and co-editor of the 
Proceedings. 

The uninitiated—whose prime interest is to obtain an overview of research and 
theories relating to consumer behavior—should be reminded that these volumes are 
confined to the research of economists. Sociologists, psychologists and others have con- 
cerned themselves with consumer behavior. An overview of their approaches to con- 
sumer behavior is accessible in the just-published papers of another conference on con- 
sumer behavior: Nelson N. Foote (editor), Household Decision-Making (New York 
University Press, 1961). 

As to organization, papers in Volume I of the Proceedings deal with demand studies 
of major components of consumption; those in Volume II deal with saving and its 
components. Friend’s Introduction, which summarizes all papers in considerable de- 
tail, is useful as a preview and review. 

A listing of authors and titles conveys a good notion of the subject-matter of the 
conference: 


Author(s) Title of Paper 


Volume I: 

Crockett-Friend A Complete Set of Consumer Demand Relationships 

Daniere-Gilboy The Specification of Empirical Consumption Structures 

Brady Quantity and Quality of Clothing Purchases 

Houthakker-Haldi Household Investment in Automobiles: An Intertemporal Cross-Sec- 
tion Analysis 

Lippitt Determinants of Consumer Demand for House Furnishings and Equip- 
ment 

Peters A Covariance Analysis of Locational Relationships in Furniture-Home 
Furnishings Expenditures 

Crockett Demand Relationships for Food 

Hamburg Demand for Clothing 





BOOK REVIEWS 1019 


Maisel-Winnick Family Housing Expenditures: Elusive Laws and Intrusive Variances 
Ferber Service Expenditures at Mid-Century 


Volume II: 


Watts-Tobin Consumer Expenditures and the Capital Account 

Modigliani-Ando The “Permanent Income” and the “Life Cycle” Hypothesis of Saving 
Behavior: Comparison and Tests 

Bodkin Windfall Income and Consumption 

Friend-Schor Who Saves? 

Klein Entrepreneurial Saving 

Friend-Jones The Concept of Saving 

Kravis Relative Income Shares in Fact and Theory 

Miner Consumer Personal Debt: An Inter-Temporal Cross-Section Analysis 

Lamale Methodology and Appraisal of Consumer Expenditure Studies 


The economics of space and the strategic role assigned to saving by economic theory 
lead me to confine my detailed comments primarily to the papers on saving. This 
procedure denies neither the excellence nor the attractiveness of papers not discussed 
directly. 

One major issue for this conference is the quality of cross-section data on saving. 
Friend-Schor reach the conclusion that saving data collected from single period sam- 
ple surveys are very bad, so bad that “few definitive statements can be made about the 
saving behavior of different income groups in the population.” This appraisal applies 
to both the 1950 BLS-Warton Survey and the Surveys of Consumer Finances, basic 
source of earlier analysis of saving. Friend-Schor’s conclusions rest on a component- 
by-component comparison of survey saving aggregates with adjusted S.E.C. esti- 
mates. One might dissent from some of Friend-Schor’s procedures. For example, one 
might object that Friend-Schor are somewhat naive in applying sophisticated adjust- 
ments to basically crude data (based on answers from memory to relatively global 
questions). Nonetheless, in my judgment any fair-minded person would conclude 
with Friend-Schor that the survey data are inadequate. 

However, survey saving data are not as bad as a literal reading of Friend-Schor 
would imply. De facto, Friend-Schor and their colleagues in these volumes take the 
view that survey data are sufficiently good to discriminate between high savers and 
low savers even though they may not yield accurate estimates of average propensities 
to save. Were it otherwise, the efforts of these researchers, the staging of this con- 
ference, the publications of these volumes would not have been justified. 

The “permanent income” hypothesis dominated this conference. Though only two 
papers dealt directly with it, almost all made ex ante or ex post (after criticism) bows 
in this direction. Bodkin’s paper strikes at one of the two main propositions of the 
permanent income hypothesis, the relationship between transitory income and con- 
sumption. Identifying 1950 “GI” life insurance dividends as transitory income, 
Bodkin finds a marginal propensity of consume out of dividends (postulated as zero 
by the Friedman theorv) not significantly different from the marginal propensity to 
consume out of ordinary income. Friedman in rebuttal argues that (1) the size of 
dividends understates the extent of windfall income (since more is expected later) and 
(2) dividend payments may be a proxy for permanent income. Replying, Bodkin ob- 
jects that recipients had no knowledge of possible future dividends and had received 
none prior to 1950. Interestingly enough, a recent note by Kreinin gives an opposite 
result from Bodkin’s. In the case of another form of transitory income, restitution 
payments to Israeli households, Kreinin found a marginal relationship close to zero. 
This last piece of evidence is consistent with Duesenberry’s comment at this confer- 








1020 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


ence that different types of transitory income may be regarded differently and dis- 
posed of differently. 

The Modigliani-Ando paper deals with another major proposition of the permanent 
income-life cycle models, the notion that the relationship between consumption and 
permanent income is one of proportionality. After developing rigorously the impli- 
cations of both the life cycle and permanent income models, Modigliani-Ando at- 
tempt to obtain unbiased estimates of the permanent income elasticity of total con- 
sumption by computing regressions of the log of mean consumption on the log of mean 
income over cells defined by appropriate criteria other than income (e.g., city size, edu- 
cation, etc.). In what they regard as their most appropriate test, households are classi- 
fied by total housing expenditures, an instrumental variable used in place of perma- 
nent income, and then the relationship between consumption and income (in logs) is 
calculated. This and several other tests—though by no means at all—tend to support 
the proportionality relationship. 

The use of instrumental variables (or proxies) was much discussed. As Crockett- 
Friend noted in defending their paper, the use of proxies may introduce biases if the 
proxy variable has an effect “of its own” on consumption. For example, housing ex- 
penditures may represent not only the permanent level of income to which the house- 
hold has adjusted, but also the taste of the household for more vs. less luxurious 
housing. 

Proponents of the permanent income hypothesis—notably Reid, Eisner, and 
Mincer—made an analogous attack on most authors for using current income as a 
proxy for permanent income. They argued that multiple regressions estimates of 
total consumption on income, age, etc. would yield too low income coefficients be- 
cause the effects of the permanent income variable are partially pre-empted by non- 
income variables which are proxies for it (e.g., age, family size, occupation). Thus, the 
more multivariate the analysis, the less likely it is that the income coefficient will rep- 
resent the true long-run relationship between permanent income and consumption. 
This position is asserted despite the precautionary steps taken by most authors of 
eliminating groups for whom transitory components are likely to be exceptionally 
large—the self-employed, incomes under $1,000 and over $10,000, the unemployed, 
nonwhites. 

The reader of these volumes may find himself asking: Pure knowledge aside, to 
what use can this flood of regressions be put? If his concern is stabilization policy and 
more specifically, the short-run responses of consumers to short-run shifts in income, 
then he will find the current preoccupation with the permanent income hypothesis an 
unhappy development. For stabilization purposes theories explaining the relationship 
of measured consumption to measured income are relevant. Unfortunately these are 
precisely the relationships which the permanent income hypothesis eschews. 

In terms of future theoretical developments, I venture to say that the paper most 
likely to be remembered from this collection is that of Watts-Tobin, the only paper 
except Miner’s and Houthakker-Haldi’s to deal with the capital account. Watts-Tobin 
assert, test, and find statistical support for the hypothesis that consumers endeavor to 
maintain a certain balance among various items in the balance sheet so that change in 
each stock is negatively related to the initial level of that stock itself, but positively 
related to the initial level of other stocks. 

The intellectual investments of nonspecialists in these volumes must necessarily be 
selective. Both by quality and topic, the following papers (in my judgment) should 
have first claim on the interest of the nonspecialist: Crockett-Friend (who estimate 





BOOK REVIEWS 1021 


income, age, and other effects for all major categories of consumption and total con- 
sumption), Daniére-Gilby (incorporating ingenious methods of resolving non-linear 
relationships into linear relationships and of testing sequentially for interaction ef- 
fects), Houthakker-Haldi (whose model explains gross investment in cars as an at- 
tempt to maintain a desired inventory level), Lippitt or Peters (using analysis of 
variance and covariance respectively to isolate determinants of expenditures for 
home furnishings), Tobin-Watts, Modigliani’s Section III, Bodkin, Klein (on en- 
trepreneurial saving), and Miner (a two-period cross-section showing how consumer 
debt parameters shifted over a six-year period). 

The reader should not miss the insightful comments of Eisner and Reid (on Friend- 
Crockett), Morgan (on the strategy of analysis), Duesenberry (on permanent in- 
come), Friedman (at his most ingenious in defending the permanent income hy- 
pothesis), Okun (arguing for a more eclectic theory of consumption), and Zellner 
(proposing a general theory incorporating permanent income and the Watts-Tobin 
stockflow relationships). 

Summing up, the remarks of Margaret Reid are pertinent: “Probably never has a 
single conference added so to a stock of regression coefficients. ... ” She then notes: 
“A stock of coefficients may be the beginning, but is not the end of wisdom.” By 
these terms this conference has made a good beginning. The attempt to create 
theoretical order of this flood of empirical results will occupy consumption economists 
for some time ahead! 


Collecting Food Purchase Data by Consumer Panel. G. G. Quackenbush and J. D. Shaffer. 
East Lansing: Michigan State University, 1960. (Technical Bulletin 279, Agricultural 
Experiment Station) Pp. 74. Paper. No price listed. 


Francis WALKER, Purdue University 


; me publication reports some of the experience obtained in the development and 
operation of the Michigan State University Consumer Panel. Many of the prob- 
lems encountered in the nine years of actual operation of the panel are discussed and, 
for many of these problems, procedures have been suggested which would alleviate 
the attendant difficulties. 

The Consumer Panel was composed of about 250 households selected to be repre- 
sentative of Lansing, Michigan, a city of about 100,000 population. During the 
period (1951-1958) of operation, 102,323 weekly purchase diaries were collected for 
tabulation. Data on food purchased for home use, giving the quantity, price, and 
expenditure for each item were reported each week. In addition, expenditures and 
quantities for meals away from home and income after federal income tax were re- 
ported. 

The report contains three main topics: (1) the sampling problem, (2) training, 
servicing, and building rapport, and (3) reliability of data. Included in these three 
areas were discussions on recruiting panel members, longevity of panel membership 
and its conditioning effect on the panel activities, panel vs. interview recall data, panel 
vs. telephone interview data, costs, research potential, and a copy of the diary used. 
A few results of research using this panel data were presented, but one must go to 
the original research reports in order to find anything but a brief statement of price 
and income elasticities, effects of advertising on consumption, and effects of season 
of year on consumption. 

Interview recall data and telephone interview data each gave evidence of over- 
stating the expenditures and quantities of food consumed in relation to those re- 











1022 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


ported currently by the panel. With regard to the conditioning effect of being on the 
panel for long periods of time, the authors note that the extent and kind of condition- 
ing is unknown and that, in terms of total food expenditures, the conditioning effect 
(29¢ per week average on an average weekly expenditure of $17.00) is not great for 
the sample. P 

The high cost of forming and operating a consumer panel is justified by the pro- 
lifieness of the panel in data generation and the research potential using this data. 

The greatest use for this report will probably be found by persons who are in- 
terested in interpreting the results of studies which have used the M.S.U. Consumer 
Panel data or by persons who might desire to obtain this panel data to use in their 
own research. The research worker interested in setting up a consumer panel will 
find few specifics in the general discussion contained in this bulletin. 


Naive Set of Theory.Paul R. Halmos. Princeton, New Jersey: D. Van Nostrand Com- 
pany, Inc., 1960. Pp. viii, 104. $3.50. 


Sipney G. WinTeER, Jr., The RAND Corporation 


HE author explains in the preface the senses in which the word “naive” is to be 
‘Bavheanse: “ ..the language and notation are those of ordinary informal 
mathematics. A more important way in which the naive point of view predominates 
is that set theory is regarded as a body of facts, of which the axioms are a brief and 
convenient summary; in the orthodox axiomatic view the logical relations among 
various axioms are the central objects of study.” (p.v.) Since the statement “set 
theory is regarded as a body of facts” presumably summarizes what the author 
regards as the chief distinctive characteristic of his book, it may be useful to comment 
on the meaning of this phrase as revealed in the book itself. ' 

One interpretation of the statement would be that set theory is regarded as stating 
facts about, e.g., Venn diagrams, in the sense that Euclidean geometry could (naively) 
be regarded as stating facts about the geometric properties of figures drawn on the 
blackboard. The major virtue of an approach of this kind would be its emphasis on 
the intuitive “empirical” reasonableness of the results stated in the theory, and in 
a book taking this approach one would expect a wealth of diagrams and examples 
that would make clear the factual character of these results. This is not what Halmos 
has in mind. 

For, surprisingly, this treatment of set theory as a body of facts presents a theory 
in which there are no individuals—i.e., the elements of sets are always sets themselves. 
The author does say (pp. 1-2) that “By way of examples we might occasionally speak 
of sets of cabbages, and kings, and the like, but such usage is always to be construed 
as an illuminating parable only, and not as a part of the theory that is being de- 
veloped.” There are, however, very few such illuminating parables to be found in 
the book. Those few are confined entirely to the preliminary discussions of certain 
concepts, and never appear as illustrations of theorems. 

The contrast drawn between the naive approach and the axiomatic view in which 
“the logical relations among the axioms are the central objects of study” is a clearer 
guide to the special character of the book. Little attention is paid to the axioms, 
per se—to such questions as consistency, independence, possible alternative formula- 
tions, etc. There is barely a hint, for example, that the axiom of choice is in any sense 
more controversial than the other axioms introduced. The emphasis is on a rapid 
development of concepts and results rather than on a thorough examination of each 
step that is taken. In spite of the almost total absence of examples and the fast 
pace (with many steps left to the reader to verify), the presentation is lucid. There 





BOOK REVIEWS 1023 


are relatively few places at which the reader with a minimum of “mathematical 
maturity” is likely to get seriously bogged down. 

The following partial list of section titles will indicate the range of topics treated 
(in addition to the obvious preliminaries): Relations, Functions, Numbers, The 
Peano Axioms, Arithmetic, Order, The Axiom of Choice, Zorn’s Lemma, Well Order- 
ing, Transfinite Recursion, Ordinal Numbers, The Schréder-Bernstein Theorem, 
Countable Sets, Cardinal Numbers. As this list indicates, the book does not supply 
very much of the sort of set theory required for a rigorous development of prob- 
ability theory. Its direct usefulness to the statistician or teacher of statistics is, 
therefore, doubtful. It is, however, admirably suited to the purpose the author has 
in mind— “... to tell the beginning student of advanced mathematics the basic 
set-theoretic facts of life, and to do so with the minimum of philosophical discourse 
and logical formalism.” (p.v.) 


Elementary Logic of Science and Mathematics. P. H. Nidditch. Glencoe, Illinois: The 
Free Press, 1960. Pp. iv, 371. $4.00. 


STEVEN OreEyY, University of Minnesota 


Tt book is intended as an elementary text on the methodology of science. The 
ambitious scope of the book can be inferred from the following list of chapter 
headings: logic and the sciences, the propositional calculus, the functional calculus, 
Boolean algebra and set theory, observation and the operational pattern of science, 
measurement, probability, experimentation and hypothesis, mathematics and de- 
duction, deduction and hypothesis in the inductive sciences. 

Clearly the author does not have the space to discuss any of the topics with which 
he is concerned in great depth. In the chapter on probability, for example, the philo- 
sophical foundations of the subject are discussed in twelve pages. The next nine 
pages are devoted to a mathematical treatment of probability theory restricted to 
finite sample spaces. 

The book contains many interesting digressions and asides. In particular the fre- 
quent historical notes constitute a good feature. 


Introductory Formal Logic of Mathematics. P. H. Nidditch. Glencoe, Illinois: The Free 
Press, 1960. Pp. vi, 188. $3.00. 


Steven Orey, University of Minnesota 


H1s book is intended as an introductory text on formal logic and elementary set 
theory. Its subject matter is the propositional and predicate calculus and the most 
elementary parts of set theory. : 
The author expounds views which practically no mathematician would find ac- 
ceptable and which must certainly be confusing to the layman. He writes in his in- 
troduction: 


“The number of original books or papers on mathematics in the course of the last 
300 years is of the order of 10°, in these the number of even close approximations to 
really valid proofs is of the order of 10'. . . . It is remarkable, in view of this, that so 
many mathematicians pride themselves on what they call their ‘logical rigor’.” The 
book is suffused with polemical writing of this sort. 

In the technical development of the subject the author uses new notation and 
terminology throughout. The new terms are then usually abbreviated. This makes 
for unpleasant reading. 











1024 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


Stochastic Processes: Problems and Solutions. Lajos Takdcs. (Translated by P. Zador) 
New York: Juin Wiley and Sons, Inc., 1960. Pp. xii, 137. $2.75. 


Leo Katz, Michigan State University 


H1Is is another in the new series of Methuen’s monographs on applied probability 
Tina statistics, edited by M. 8. Bartlett. By definition, it is inexpensive, pocket- 
sized but substantially-bound, and intended as an introduction to a specialized topic; 
in this case, stochastic processes of the most important types for applications in 
physics. 

The three principal chapters are devoted to Markov Chains, Markov Processes, 
and Non-Markovian Processes. In each case, the main results are given without proof 
and with rather brief discussion of the implications. Strategically placed in each 
chapter are lists of problems, the raison d’etre of the book. Each problem can be solved 
by use of the basic theorems given in the text and it is in the solution of these prob- 
lems that the reader is expected to develop understanding of the methods. The reader 
who has given a problem an honest try and has not succeeded in solving it, may turn 
to Chapter 4 to find complete solutions of all problems. Thus, the book has some 
similarities to teaching machines as well as to solitaire card games with cheating 
condoned. 

The reader should be familiar with the equivalent of the first ten chapters of Feller’s 
Probability. Chapters 1 and 2 of Takdes give an alternative treatment as noted (with 
some extensions) of the subsequent portion of Feller. Most of the results of Chapter 3 
are scattered through the literature, representing relatively recent work by Black- 
well, Cramér, Doob, Feller, Yaglom, Khintchine, Smith, Takacs, and Tacklind; it 
is indeed a useful service to have them organized and assembled as they are. 


Elements of Mathematical Statistics. Howard W. Alexander. New York: John Wiley and 
Sons, Inc., 1961. Pp. xi, 367. $7.95. 


IsADORE BLUMEN, Cornell University 


_ text will introduce the student with a little calculus to the elementary tech- 
niques and manipulations used in mathematical statistics. The style of writing is 
pleasant and informal with little attempt to introduce much rigor or abstraction in 
definitions or proofs. The many exercises for the student are appropriate to the 
level of the book. 

The author begins by introducing probability by reference to sets defined on a 
sample space. (This involves some introduction to the elements of set theory.) He 
then discusses random sampling for qualitative variates, obtaining the hypergeomet- 
ric distribution. The rest of the first chapter is largely descriptive statistics for the 
univariate case. 

In the next chapter, probability is defined for continuous variables, the usual ele- 
mentary probability densities are given, and the rule for getting the distribution of a 
function of a chance variable is discussed. Chapter 3 is concerned mostly with the 
distribution of sums of independent chance variables and some related topics such 
as generating functions and asymptotic normality. 

Chapter 4 is devoted to statistical inference: tests of hypotheses, confidence in- 
tervals, and point estimation. The method of maximum likelihood is explained and 
there is a starred section on likelihood ratio tests. 

Chapter 5 is concerned with the determination of the probability distributions of 
standard statistics. In particular, the distribution of s, t, and F are obtained under 
the usual assumptions. An outline of a “proof” for the distribution of Pearson’s 
goodness-of-fit statistic is given in a starred section. 

Chapter 6 is a conventional treatment of regression and correlation while Chapter 7 





BOOK REVIEWS 1025 


is devoted to the analysis of variance for one and two-way classifications. In both 
chapters there is material in a starred section which covers some of the relevant dis- 
tribution theory. 

In general, the text will be inappropriate for good students or those really interested 
in developing their statistical or mathematical sophistication. Important topics are 
not systematically developed, statements of important theorems are excessively in- 
exact, and illustrations and outlines of proofs are confusing. By comparison with 
Brunk, Frazer, or Hogg and Craig, the approach to both statistics and mathematics 
has a distinctly old-fashioned flavor. 

The above remarks can be illustrated by three typical examples: 


(1) Student’s ¢ is introduced, defined, and confidence intervals are defined and ob- 
tained for the mean of a normal distribution, all in just three pages halfway through 
the book. It is forty pages later on that the author really settles down to discussing 
confidence intervals. It is a hundred pages after the initial bolt out of the blue that 
the author derives the Student distribution. (Meanwhile, he has used the two sample 
t in his discussion of hypothesis testing.) 


(2) In discussing the Central Limit Theorem, there is no statement as to what con- 
vergence to a limiting distribution means. For clarification, we turn to two discus- 
sions (pp. 146-51, pp. 213-15) of convergence of the binomial and there it is argued 
that /npq(z)p7q"*—>(2r)e- "2, This is not the useful form of the theorem. 


(3) The author has an extended (starred) discussion (pp. 251-259) of the Pearson 
goodness of fit statistic. He outlines a proof of the asymptotic distribution by first 
treating 2;=(2;—np,)/ Vn as independent, approximately normal variables. The 
reviewer found the attempt to lose the degree of freedom thus gain quite unconvincing. 


A Second Course in Statistics. R. Loveday. New York: Cambridge University Press, 1961. 
Pp. xi, 155. $1.85. 


GERALD J. LIEBERMAN, Stanford University 


LTHOUGH this book is called A Second Course in Statistics, it is really an elementary 
book and should be viewed as a first course. There are no statistical prerequisites, 
and calculus is not required although the integral sign does appear occasionally. 

The book contains the following chapter headings: 1) The Normal Distribution, 
2) Probability, 3) Random Selection, 4) The Binomial Distribution, 5) The Poisson 
Distribution, 6) The X?*-Distribution, 7) The Use of X? in Testing Contingency 
Tables, 8) Samples and Significance, 9) Quality Control, 10) Method of Least 
Squares, 11) Correlation by Product-Moments, 12) Correlation by Ranks, and 13) 
Miscellaneous Exercises. Each chapter contains exercises which are quite good. 
These exercises are slanted towards the physical and engineering sciences. 

The text is a “cook book” and contains little material that doesn’t appear in each 
of a dozen other books of this type. Its assets are brevity (114 pages), good ex- 
ercises, and a logical sequence of chapters. Its shortcomings are those associated with 
“cook books,” somewhat exaggerated by the brief treatment of many of the topics. 
Assumptions and pitfalls underlying the standard techniques are often underscored. 


Elements of Statistical Inference. R. M. Kozelka. steading, Massachusetts: Addison- 
Wesley Publishing Company, Inc., 1961. Pp. x, 150. $5.00. 
Prerer Zeuna, Colorado State College 


uis brief treatment of statistical inference was intended and written primarily 
for the student of the behavioral sciences with a minimum calculus background 








1026 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


(the first four chapters of Richmond’s introductory calculus, for example). For 
students with more than this minimum calculus background a more extensive treat- 
ment would be desirable and is available; for students with no calculus background 
the book would certainly not be suitable. Hence it should be emphasized at the out- 
set, that the usefulness of the book as a classroom textbook is definitely limited to 
students with a similar background. With these restrictions in mind the author has 
done a fairly decent job of bringing together most of the standard topics in statistics 
and presenting them in abbreviated form. It is his avowed purpose to illustrate the 
basic ideas of statistics without spending time on techniques. Subject to the com- 
ments made below, it is the reviewer’s opinion that he has accomplished his task. 

The teacher who uses this text will need to have or else develop a tolerant attitude 
toward the light-hearted style in which the book is written. The author states in his 
preface that his examples and problems shall be concerned with happenings in King 
Arthur’s Court and surrounding territory, to “avoid having to deal with ‘realistic’ 
data and the resulting numerical complications which often lead the beginning student 
to lose sight of the objective of the problem.” In the first place the examples are not 
entirely restricted as asserted (nor should they be); in the second place it is not clear 
that the difficulties for the beginning student alluded to above could not have been 
avoided equally well by employing ficticious data with a more modern setting. At 
times the continued reference to King Arthur’s Court becomes a little distracting. 
More than once the reason given for a restriction of a problem and/or an abbreviated 
notation is that statisticians are lazy people. Certainly the author did not intend that 
this be taken seriously and the remark is in keeping with his easy style. Even so, the 
space might well have been devoted with profit to a discussion of the more cogent 
reasons for such restrictions which do, after all, exist. 

It was refreshing to see more emphasis placed on approximation and the use of the 
symbol= when specifying approximate answers. It is unfortunate that consistent use 
was not made of the symbol, for example such things as “ . . .=0.238 nearly” when 
the approximation symbol might well have been used. Two topics that are quite im- 
portant to the student of behavioral sciences are the matter of random sampling and 
the asumption of a continuous scale of measurement. The reviewer was favorably 
impressed with the author’s honest treatment of these topics. 

The treatment of probability as a set function is up-to-date and reasonably well 
done. In the reviewer’s opinion it is unfortunate that the author failed to at least 
mention the difference between an outcome and the event (set) consisting of a single 
outcome, since the trouble was taken to define probability as a set function. 


Annotated Economic Statistics of Japan for Postwar Years up to 1958. Institute of Eco- 
nomic Research, Hitotsubashi University. Tokyo, Japan: 1960. Pp. vii, 192. No price listed. 


M. BRoNFENBRENNER, University of Minnesota 


APAN’S closest approach to the National Bureau of Economic Research is perhaps 

the Institute of Economic Research at Hitotsubashi University in the outskirts 
of Tokyo. Founded in 1948 with Shigeto Tsuru as its first Director, the Institute has 
achieved in short order an international reputation in empirical economics from the 
work of scholars like Tsuru, Kazushi Ohkawa, and the present Director, Chotaro 
Takahashi. 

The volume under review assembles, with detailed methodological introductions 
and notes, some 75 principal statistical series used at and in some cases developed by 
the Institute. These are divided into 13 major groups: national income and wealth, 
population and labor force, primary production, secondary production, commerce, 





BOOK REVIEWS 1027 


business population, international economics, transportation, prices, employment and 
wages, consumption and family budgets, money and banking, and public finance. 
Data cover varying periods, usually beginning in 1929 or the early 1930’s and ending 
in 1958. A surprising amount of detail is included, comparing the size of the book 
with, say, the American Statistical Abstract. Certain of the individual tables also 
attain a degree of sophistication unexpected in a compilation of this type. Thus the 
section on national income and wealth includes not only the expected time series but 
also an integration of the national income, money flows, and national balance sheet 
plus a 36-by-36 input-output table. (Each of these covers only one year, 1958 for the 
first table mentioned and 1954 for the second.) The section on international economic 
relations also includes an international trade matrix (for 1955) and indexes of quan- 
tity, price, and trade terms for Japan since 1929. 

Aside from its title and table of contents, the book is entirely in Japanese for 
Japanese readers. Many Japanese official and other statistical sources include table 
titles and notes in English, but not this one. The user of these tables must either 
know Japanese himself or have particular tables translated for his use, together with 
the accompanying notes and introductory annotations. Translation is not usually 
an easy task, since Japanese-English “Economic Dictionaries” are in fact stronger in 
commerce and business administration than in economics or statistics. 

This is in summary an invaluable reference collection of Japanese statistical data 
for the past generation, which few specialists on contemporary Japanese economic 
problems can afford to be without. Its international value might have been increased 
by concession to possible readers ignorant of the Japanese language, but if we make 
no concessions to Americanists ignorant of English, why expect concessions to 
Japanologists ignorant of Japanese? 


Money, Growth and Methodology: In Honor of Johan Akerman. Hugo Hegeland, Editor. 
Lund, Sweden: CWK Gleerup, 1961. Pp. xii, 509. 


HIs volume of 42 essays in economics is published in honor of Johan Akerman. 

The volume is divided into five parts, as follows: Seven Views on the Theory of 
Growth; Money, Interest, and Inflation; Methodology, Ideology, and Economic 
Concepts; Trade, Competition, and Programming; Economic Policy and Economic 
Development. Contributors include Erich Schneider, Brinley Thomas, Jan Tinbergen, 
Gerhard Tintner, Mary Jean Bowman, Oscar Moygenstern, Gunnar Myrdal, Hans 
Brems, Paul Samuelson, and Alvin Hansen. R. F. 


Towards a Firmer Basis of Economic Policy. National Bureau of Economic Research. 
New York: National Bureau of Economic Research, Inc., 1961. Pp. 85. 


P’™ 1 of this Forty-First Annual Report of the National Bureau stresses the role 
of its research program in the development of economic policy. In this section, 
Solomon Fabricant brings out the importance of various National Bureau projects 
for the analysis of business cycles and of economic growth. 

Part 2 of this report presents a general view by Geoffrey H. Moore of the activities 
of the National Bureau during the 1960’s; detailed reports on individual projects are 
provided by the staff members in Part 3. R. F. 


Yearbook of National Accounts Statistics. United Nations, Department of Economic and 
Social Affairs. New York: United Nations, 1961. Pp. 284. $3.50. 


HE fourth issue in this series, the present Yearbook presents national income data 
for 95 countries and territories. Detailed national accounts are presented for 69 
countries. The data are generally annual estimates for the period 1953-1959. 








1028 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


A summary section of international tables provides time series data for principal 
aggregates for 92 countries and territories for the years 1950-1959. Among other 
statistics, these tables show rates of growth of the components of real gross domestic 
product and relative shares in national income of different typesofincome. R.F. 


Quality Control Yearbook, 1960. Arnold J. Rosenthal, Editor-in-Chief. New York: Inter- 
science Publishers, Inc., 1960. Pp. 1369. $60.00. 
= high-priced volume abstracts papers dealing with quality control and applied 
statistics published in periodicals in the United States and other countries in 1960. 
Besides quality control, subjects covered include mathematical statistics and probabil- 
ity theory, experimental designs, correlation and curve fitting, operations research 
(including sales forecasting and business economics), and statistical measurement and 
process control. On the whole, the abstracts are well done, presenting the principal 
findings of the particular article in detail, including tables and charts. 
Unfortunately, locating literature under principal. headings is not easy. This ma- 
terial was originally published in a series of monthly abstracts, and the editors did 
not see fit to collate the separate abstracts. As a result, abstracts under the same 
principal classification are scattered throughout the book, with no page references 
even in the table of contents. Subject and author indexes are appended, but are no 
substitute for the inadequate table of contents of an otherwise potentially valuable 
reference volume. R. F. 


Contributions to Economic Analysis: Economic Forecasts and Policy. Second revised edi- 
tion. H. Theil. Amsterdam: North Holland Publishing Company, 1961. Pp. xxxii, 567. 
$11.25. 

HE second revised edition of this important work includes a number of modifica- 

tions and additions that should make it even more useful to students of quantitative 
economics and to statisticians interested in knowing the current problems and achieve- 
ments of their discipline in a significant special field of application. Part of the devel- 
opment of statistical techniques has been simplified and additional material has been 
added on mixed estimation, small sample properties of simultaneous equation es- 
timators, trace correlation of a simultaneous system, and extension of the Gauss- 
Markov theorem. C. H. 


Teacher Supply and Demand in the Public Schools, 1961. National Education Association 
of the United States. Washington, D. C.: National Education Association of the U. S., 
1961. Pp. 48. $1.00. Paper. 
7 Research Division of the National Education Association now makes annually 
a national report on teacher supply and demand in public schools. This is the 
fourteenth such report. The Research Division of the National Education Associa- 
tion also has been making biennial reports on the need for and supply of university, 
college, and junior-college teachers. The fourth report in this biennial series, entitled 
“Teacher Supply and Demand in Universities, Colleges, and Junior-Colleges, 1959-60 
and 1960-61” was scheduled for publication in May 1961. 

The contents of the present report are indicated by the main headings used, namely: 
the Supply of Teachers, the Demand for Teachers, the Preparation of Elementary- 
School Teachers, and the Principal Occupations of Members of the Class of 1960. 

Many data are assembled which can only briefly be indicated by the headings of 
the eight major tables. These include: College Students Completing Certificate Re- 
quirements, 1961 and 1960, by field; College Students Completing Certificate re- 
quirements, 1961 and 1960, by State; College Graduates Prepared to Teach, by Field, 
and Per Cent Change from 1950; New Teachers Employed in 1960-61 and New 









BOOK REVIEWS 1029 


Teachers Produced in 1959-60: 26 States and District of Columbia, by Major and 
Minor Fields; Elementary-School Teachers in Certain States, by Semester Hours 
of Credit, 1948-49 through 1960-61; Elementary-School Teachers in 11 States and 
the District of Columbia, by Semester Hours of Credit, 1948-49 through 1960-61; 
Elementary-School Teachers in 38 States and the District of Columbia, by Semester 
Hours of Credit, 1960-61; Occupation on November 1, 1960, of Persons Who Were 
Graduated Between September 1, 1959, and August 31, 1960, With Qualifications for 
Standard Teaching Certificates. 

In addition, there is an appendix table giving college and university students com- 
pleting certificate requirements, 1961 and 1960, by state and type of preparation. 
Also, for several states, there is a table of number of new elementary- and high-school 
teachers who did not teach anywhere the preceding year according to their major and 
minor fields. 

As will be plain from the statement of the contents of this report given above, 
there is much interesting information here and there are many other tables. The 
comments made on these and other tables in the report are quite clear, and interest- 
ing. In addition, there is a page of highlights. 

The report is well worth reading by anyone concerned with elementary-and high- 
school education. From the statistician’s point of view, however, the report would 
benefit from the addition of a brief technical appendix, say 5 to 10 pages, stating and 
evaluating the methods used in obtaining the estimates and predictions. W.G.M. 























Economic Survey Data. Survey Research Center. Ann Arbor, Michigan: University of 
Michigan, 1960. Pp. 60. No price listed. Paper. 

HIS pamphlet describes the data available to qualified users from the Survey 

Research Center at the University of Michigan. Primary emphasis is given to data 
from the University of Michigan. Primary emphasis is given to data from the Sur- 
veys of Consumer Finances, secondary emphasis to the Periodic Surveys of Consumer 
Attitudes and Behavior, and brief mention to other smaller, and more specific studies. 
A discussion of how to use the data and a description of the Ford Foundation Program 
for making data available is also included. The appended bibliography covers pub- 
lished materials using the survey data. M.N. 















Measurement and Evaluation in Psychology and Education. Second edition. Robert L. 
Thorndike and Elizabeth Hagen. New York: John Wiley and Sons, Inc., 1961. Pp. viii, 
602. $7.25. 
y ov first edition of this book, appearing in 1955, provided a basic text and reference 
source for individuals planning to use tests in guidance and education. The second 
edition is similar to the first in approach and content. A number of sections have been 
rewritten, and in two instances, the order of the chapters has been changed. A 
discussion of major new tests appearing in the last six years has been included, and 
reference is made to significant new research where it fits the pattern of an elementary 
text. R.8. 


Numerical Methods of Curve Fitting. P. G. Guest. New York: Cambridge University 
Press, 1961. Pp. xiv, 422. $15.00. 

EADERS familiar with Brunt’s, “The Combination of Observations,” 1917, will 

find the present book to be a modernized version of similar material. “The book 
is intended primarily for students and graduates in physics, and the types of ob- 
servations discussed will be those most commonly met with in routine work in the 
physical sciences. It is hoped that the book will be useful as a reference work for 
statisticians and biologists, since much of the material presented here does not find 



















1030 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


a place in statistical textbooks but is only available in the original literature.” With 
this aim in mind, the author separates the book into three sections. Part I is designed 
as a summary of the relevant statistical theory, e.g., expectation, normal and related 
distributions, statistical tests, Poisson and binomial distributions. Part II is con- 
cerned with the fitting of straight lines. Part III (Chapters 7 to 12) comprises the 
principal subject of the book and deals with the fitting of polynomial and special 
types of curves. Although Part II is included in III, the importance of the linear 
case warrants a separate treatment. Chapter 7: Estimation of the Polynomial Co- 
efficients contains a detailed discussion of the normal equations, including an 
iterative method of solution, orthogonal polynomials, equally spaced observations 
with equal and varying weights. Chapter 8 is concerned with standard deviations 
of the estimates, and chapter 9 with the grouping of observations. 

There are many examples fully worked out. A very useful feature is the inclusion of 
a notes and reference section and a listing of relevant tables. 

Presumably, the notation is conventional to a physicist, but this reviewer was 
bothered by the lack of distinction between random variables and constants, e.g., 
Ey=Y, X =y—Y, Q(X) =Pr {x>X}. The statistician who is an occasional user of 
these techniques will perhaps find the book troublesome; the frequent user will be 
able to overcome the minor deficiencies and should be able to take advantage of the 
vast amount of information contained in this book. 1.0. 





PUBLICATIONS RECEIVED 


Asher, Robert E. Grants, Loans, and Local 
Currencies: Their Role in Foreign Aid. 
Washington, D. C.: Brookings Institu- 
tion, 1961. $2.50 Cloth, $1.50 Paper. 

Banbury, J. and Maitland, J., Editors. 
Proceedings of the Second International 
Conference on Operational Research. New 
York: John Wiley and Sons, Inc., 1961. 
$15.00. 

Bass, Frank M., Buzzell, Robert D., 
Greene, Mark R., Lazer, William, Pesse- 
mier, Edgar A., Shawver, Donald L., 
Shuchman, Abraham, Theodore, Chris 
A., Wilson, George W., Editors. Mathe- 
matical Models and Methods in Market- 
ing. Homewood, — Richard D. Ir- 
win, Inc., 1961. $8.5 

Becker, Joseph M. The Adequacy of the 
Benefit Amount in Unemployment In- 
surance. Kalamazoo, Michigan: W. E. 
Upjohn Institute, 1961. No charge, 


paper. 

Bierman, Harold, Jr., Fouraker, Law- 
rence E., and Jaedicke, Robert K. Quan- 
titative Analysis for Business Decisions. 
Homewood, Illinois: Richard D. Irwin, 
Inc., 1961. $7.95. 


Bowen, John Clinton. Some Aspects of ; 


Transfer Taxation in the United States. 
(Ph.D. Dissertation, University of Mich- 
igan, 1958) Ann Arbor, Michigan: J. C. 
Bowen—Printed by University Micro- 
films, L. C. Card No. Mic. 58-7687. 
Price not listed, paper. 

Bowman, Edward H. and Fetter, Robert B. 
Analysis for Production Management, Re- 
vised Edition. Homewood, Illinois: Rich- 
ard D. Irwin, Inc., 1961. $8.75. 

Break, George F. The Economic Impact of 
Federal Loan Insurance. Washington, 
D. C.: National Planning Association, 
1961. $3.75, paper. 

Charnes, A. and Cooper, W. W. Manage- 
ment Models and Industrial Applications 
of Linear Programming, Volume I. New 
York: John Wiley and Sons, Inc., 1961. 
$11.75. 

Commission on Money and Credit. Money 
and Credit: Their Influence on Jobs, 
Prices, and Growth. New York: Prentice- 
Hall, Inc., 1961. $3.95 Cloth, $2.00 
Paper. 

Community Council of Greater New York. 
Home Aide Service Needs of Health 
Agency Clientele. New York: Community 
Council of Greater New York, 1961. 
$1.50, paper. . 

East African Statistical Department, Ugan- 
da Unit. Enumeration of Employees, 
June 1960. met Kenya: EASD, 
1961. 3 Sh., paper. 

Educational Testing Service. /nvitational 
Conference on Testing Problems, 1960 
Proceedings. Princeton, N. J.: Educa- 
tional Testing Service, 1961. No price 
listed, paper. 


Faville, David E. Selected Cases in Market- 
ing Management. Englewood Cliffs, N. J.: 
Prentice-Hall, Inc., 1961. $4.95 Text, 
$6.60 Trade. 

Gear, H. S., Biraud, Y., and Swaroop, S. 
International Work in Health Statistics 
1948-1958. Geneva, Switzerland: World 
Health Organization, 1961. $.60, paper. 

Goodman, Bernard. /ndustrial Materials in 
Canadian American Relations. Detroit: 
Wayne State University Press, 1961. 
$7.00. 

Hegeland, Hugo, Editor. Money, Growth, 
and Methodology. (In honor of John 
Akerman) Lund, Sweden: CWK Gleerup, 
1961. No price listed. 

Inter American Statistical Institute, Pan 
American Union. America En Cifras, 
1960. No. 1—Estadisticas Demograficas. 
No. 2—Estadisticas Economicas: Pro- 
ducciones Agropecuaria, Extractiva Vege- 
tal Y Pesquera. Washington, D. C.: Pan 
American Union, 1960. $.25, paper. 

International Statistical Institute. Bulletin 
De L’Institut International De Statis- 
tique (32nd Session, Tokyo, 1960.) Tokyo: 
International Statistica: Institute, 1961. 
No price listed, paper. 

Johnston, J. Statistical Cost Analysis. New 
York: McGraw-Hill Book Company, 
Inc., 1960. $6.75. 

Joint Economic Committee, Congress of the 
United States. Government Price Statis- 
tics—Part 2. Washington, D. C.: United 
States Government Printing Office, 1961. 
$.65, paper. 

Klaman, Saul B. The Postwar Residential 
Mortgage Market. (No. 8 in National 
Bureau of Economic Research Studies in 
Capital Formation and _ Financing) 
Princeton, N. J.: Princeton University 
Press, 1961. $7.50. 

Kyburg, Henry E., Jr. Probability and the 
Logic of Rational Belief. Middletown, 
Connecticut: Wesleyan University Press, 
1961. $10.00. 

Lisle, Edmond. L’Anglais Economique: 
Dictionnaire De Concepts. Paris: Editions 
Cujas, 1961. No price listed, paper. 

Maxwell, A. E. Analysing Qualitative Data. 
(Methuen’s Monographs on Applied 
Probability and Statistics) New York: 
John Wiley and Sons, Inc., 1961. $3.00. 

Medgyessy, Pal. Decomposition of Super- 
positions of Distribution Functions. Buda- 
ote Hungary: Publishing House of the 

ungarian Academy of Sciences, 1961. 
No price listed. 

Ministry of Agriculture, Fisheries and 
Food. Domestic Food Consumption and 
Expenditure: 1959—Annual Report of 
the National Food Survey Committee. Lon- 
don, England: Her Majesty’s Stationery 
Office, 1961. 8s. 6d. net, paper. 

Mode, Elmer B. Elements of Statistics, 
Third Edition. Englewood Cliffs, N. J.: 


1031 











1032 


Prentice-Hall, Inc., 1961. $9.65 Trade, 
$7.25 Text. 

Mosteller, Frederick, Rourke, Robert 
E. K., and Thomas, George B., Jr. Prob- 
ability: A First Course. Reading, Massa- 
chusetts: Addison-Wesley Publishing 
Company, Inc., 1961. $5.00. 

Mosteller, Frederick, Rourke, Robert 
E. K., and Thomas, George B., Jr. Prob- 
ability with Statistical Applications. 
Reading, Massachusetts: Addison-Wes- 
ley Publishing Company, Inc., 1961. No 
price listed. 

Mueller, Willard F. and Garoian, Leon. 
Changes in the Market Structure of Gro- 
cery Retailing. Madison: The University 
of Wisconsin Press, 1961. $6.00. 

National Bureau of Economic Research, 
Inc. Towards a Firmer Basis of Economic 
Policy—F orty-Fifth Annual Report. New 
York: National Bureau of Economic Re- 
search, Inc., 1961. No charge, paper. 

National Education Association. Teacher 
Supply and Demand in Universities, Col- 
leges, and Junior Colleges, 1959-60 and 
1960-61. Research Report 1961-R12. 
Washington, D. C.: National Education 
Association, Research Division, 1961. No 
charge while available, paper. 

National Education Association of the 
United States. Teacher Supply and De- 
mand in Public Schools, 1961. Washing- 
ton, D. C.: National Education Associa- 
tion of the United States, 1961. $1.00. 
paper. 

National Science Foundation. Employment 
of Scientific and Technical Personnel in 
State Government Agencies: Report on a 
1959 Survey. Washington, D. Ct: United 
States Government Printing Office, 1961. 
$.45, paper. 

National Science Foundation. Weather 
Modification, Second Annual Report, 
1960. Washington, D. C.: United States 
Government Printing Office, 1961. $.15, 
paper. 

maathes, Gottfried E. Guide to Probability 
and Statistics. Reading, Massachusetts: 
Addison-Wesley Publishing Company, 
Inc., 1961. No price listed, paper. 

Petersen, William. Population. New York: 
The Macmillan Company, 1961. $7.95. 

Piatier, André. Statistique Et Observation 
Economique: Two Volumes I—méth- 
odogie-Statistique, I 1—Econométrie-Con- 
joncture Comptabilité Nationale. Paris, 
France: Presses Universitaires de France, 
1961. I—NF. 20, II—NF. 22, paper. 

Rosenthal, Arnold J., Editor-in-Chief. 
Quality Control and Applied Statistics 
Yearbook 1960. New York: Interscience 
Publishers, Inc., 1960. $60.00. 

Schultz, Vincent. An Annotated Bibliogra- 
phy on the Use of Statistics in Ecology— 
A Search of 31 Periodicals. Washington, 
D. C.: Office of Technical Services, Dept. 
of Commerce, 1961. $3.00, paper. 

Solomon, Herbert, Editor. Studies in Item 


AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


Analysis and Prediction. Stanford, Cali- 
fornia: University Press, 1961. $8.75. 
State of Hawaii, State Planning Office. 
Hawaii. 
Hawaii, 
No price 


The General Plan of the State o 
Honolulu, Hawaii: State o 
State Planning Office, 1961. 
listed, paper. 

Survey Research Center, University of 
Michigan. 1960 Survey of Consumer Fi- 
nances. Ann Arbor, Michigan: Survey 
Research Center, University of Michigan, 
1961. No price listed, paper. 

Survey Research Center, University of 
Michigan. Economic Survey Data. Ann 
Arbor, Michigan: Survey Research Cen- 
ter, University of Michigan, 1960. No 
price listed, paper. 

Taskier, Charlotte E. Input-Output Bibli- 
ography 1955-1960. New York: United 
Nations, 1961. No price listed, paper. 

Thorndike, Robert L. and Hagen, Eliza- 
beth. Measurement and Evaluation in 
Psychology and Education, Second Edi- 
tion. New York: John Wiley and Sons, 
Inc., 1961. $7.25. 

Tumin, Melvin M. and Feldman, Arnold. 
Social Class and Social Change in Puerto 
Rico. Princeton, N. J.: University Press, 
1961. $10.00. 

United Nations, Department of Economic 
and Social Affairs. Yearbook of National 
Accounts Statistics 1960. New York: 
United Nations, 1961. $3.50, paper. 

United States Department of Commerce, 
Bureau of the Census. Congressional 
District Data Book (A Statistical Ab- 
stract Supplement). Washington, %. 
United States Government Printing Of- 
fice, 1961. $1.00, paper. 

United States Department of Commerce, 
Business and Defense Services Adminis- 
tration. Basic Research Related to New 
Uses for Textiles 1961. Washington, 
D. C.: United States Government Print- 
ing Office, 1961. $.50, paper. 

United States Department of Commerce, 
Business and Defense Services Adminis- 
tration. Comparative Fabric Production 
Costs in the United States and Four Other 
Countries. Washington, D. C.: United 
States Government Printing Office, 1961. 
$.40, paper. 

United States Department of Health, Edu- 
cation and Welfare Office of Education. 
Statistics of State School Systems: 1957- 
1958, Chapter 2 Organization, Staff, 
Pupils, and Finances. Washington, D. 
United States Government Printing Of- 
fice, 1961. $.75, paper. 

von Gersdorff, Ralph. Portugals Finanzen 
(Public Finance in Portugal) (Text in 
German) Bielefeld, West Germany: Ver- 
lag Ernst Und Werner Gieseking, 1961. 
DM 18.80, paper. 

Weatherburn, C. E. A First Course in 
Mathematical Statistics—Reprint. New 
York: Cambridge University Press, 
1961. $2.75, paperbound. 





PUBLICATIONS RECEIVED 


Westoff, Charles F., Potter, Robert G., 
Jr., Sagi, Philip C., and Mishler, El- 
liot G. Family Growth in Metropolitan 
America. Princeton, New Jersey: Uni- 
versity Press, 1961. $10.00. 

Wiener, Norbert. Cybernetics: Or Control 
and Communication in the Animal and the 
Machine. Second Edition. New York: 
The Tech. Press, M.I.T. and John Wiley 
and Sons, Inc., 1961. $6.50. 

World Health Organization. Catalogue of 
World Health Organization Publications 
1947-1960. Geneva, Switzerland: World 


1033 


Health Organization, 1961. No _ price 
listed, paper. 

World Health Organization. Expert Com- 
mittee on Health Statistics, Seventh Re- 
port. Geneva, Switzerland: World Health 
Organization, 1961. $.30, paper. 

Wright, Grace S. (U. S. Dept. of Health 
Education, and Welfare) [Requirements 
for High School Graduation in States and 
Large Cities. Washington, D. C.: United 
States Government Printing Office, 1961. 
$.20, paper. 


EDITORIAL COLLABORATORS 


(Continued from page vi) 


Rosert C. McCarry, Stanford Research 
Institute 

Water W. McMauon, University of IIlli- 
nots 

Pau Meter, University of Chicago 

Rupert G. MILER, Jr., Stanford Univer- 
sity 

Feurx E. Moore, University of Michigan 

James N. Moraan, Survey Research Cenier, 
University of Michigan 

Sieeit1 Moricuti, University of Tokyo and 
Columbia University 

Dona.p F. Morrison, National Institutes 
of Health 

JoHN NETER, University of Minnesota 

ARTHUR M. Oxun, Yale University 

JOEL OwEN, Harvard University 

EMANUEL ParzEN, Stanford University 

WiuuiAM S. Peters, Arizona State Univer- 
sity 

DANIEL O. Price, University of North Caro- 
lina 

FRANK ProscHan, Boeing Scientific Re- 
search Laboratories 

RoNALD Pyke, University of Washington 

Howarp Ratrra, Harvard University 

Paut R. Riper, Wright-Patterson Air Force 
Base 

Joan Raup Rosensuart, National Bureau 
of Standards 

JEROME Sacks, Cornell University 

I. R. Savaae, University of Minnesota 

L. J. Savaae, University of Michigan 

EstHER SEIDEN, Michigittn State University 

ALBERT SHAPERO, Slanford Research Insti- 
tute 

Wixtiram C. SHELTON, United States De- 
partment of Labor 


Jacos 8. Sr1eGei, United States Bureau of 
the Census 

SipNneEy Sigce., Pennsylvania State Univer- 
sity 

Monroe G. SrrKEn, National Vital Statistics 
Division 

Watt R. Simmons, United States National 
Health Survey 

Morais Sxkisinsky, Purdue University 

MILton Soset, Bell Telephone Laboratories 

MortiMER SPIEGELMAN, Metropolitan Life 
Insurance Company 

WiiuraM A. Spurr, Stanford University 

CHARLEs Srein, Slanford University 

ANDREW STERRETT?T, Denison University 

GEORGE J. StTiauter, University of Chicago 

MERVYN Strong, Princeton University 

Dante B. Surrs, University of Michigan 

Ropert SumMErS, University of Pennsyl- 
vania 

Rosert F. Tare, University of Washington 

GERHARD TinTNmER, Jowa State University 

Joun TuKeEy, Princeton University 

ConsTANCE VANEEDEN, Michigan State 
University 

D. F. Voraw, Jr., Yale University 

Davin L. WaAuuaAcg, University of Chicago 

Joun E. Wausn, System Development Cor- 
poration 

Aurrep G. Wuitney, Life Insurance Agency 
Management Association 

Ricwarp C. Wiicock, University of Illinois 

Martin B. Wiik, Rutgers University 

S. 8. Wiiks, Princeton University 

Hans Wourr, Michigan State University 

Lorinc Woop, Bureau of the Census 

SAMUEL ZaHL, Cambridge Air Force Re- 
search Laboratories 




















JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


VOLUME 56: 1961 


NUMBERS 293-296 


Published Quarterly by the 
AMERICAN STATISTICAL ASSOCIATION 
WASHINGTON, D. C. 

1961 














JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


The Editors welcome the submission of manuscripts for possible publication. They 
should be typewritten entirely double-spaced, including footnotes, and two copies should 
be sent to the Editor, Clifford Hildreth, Wells Hall, Section A, Michigan State University, 
East Lansing Michigan. Books for review should be sent to the same address. Unsolicited 
book reviews are not accepted, but suggestions of titles for review are welcome. 


EDITOR 


CuiFFrorD HiLtpretTH, Michigan State University 
ASSISTANTS TO THE EDITOR 


PHYLLIS QUINN 
BARBARA BOEHLKE 


ASSOCIATE EDITORS 


JEROME CORNFIELD INGRAM OLKIN 
National Institutes of Health Stanford University 
Rospert FERBER JOHN W. PRATT 
University of Illinois Harvard University 
WituiaAmM G. Mapow RosEDITH SITGREAVES 
Stanford Research Institute Columbia University 
Marc NERLOVE HERBERT SOLOMON 
Stanford University Stanford University 


ADVISORY PANEL OF FORMER EDITORS 


WiiuiaMm G. Cocuran (1945-50) FREDERICK F.. STEPHAN (1935-40) 
Harvard University Princeton University 

Frank A. Ross (1926-34, 41-5) W. ALLEN WAL Is (1950-60) 
Thetford, Vermont University of Chicago 






Corrigenda: Readers and authors are urged to submit to the Editor notices of 
errors found in this or any previous volume. These will be published once a year, 
in the December issue. 

















EDITORIAL COLLABORATORS 


Irma ADELMAN, Stanford University 

WituraM R. ALLEN, Princeton, New Jersey 

R. L. ANpErson, North Carolina State Col- 
lege 

Frep C. Anprews, University of Oregon 

FRANK J. ANSCOMBE, Princeton University 

Kennetu J. Arrow, Stanford University 

Francis M. Baror, The RAND Corpora- 
tion 

Ronatp H. Beatriz, California Bureau of 
Criminal Statistics 
yARY 8. Becker, National Bureau of Eco- 
nomic Research 

C. B. Bex, San Diego State College 

Donavtp Bent.ey, Stanford University 

AGNEs BerceEr, Columbia University 

JosepH Berkson, Mayo Clinic 

Max A. Bersuap, United States Bureau of 
the Census 

A. T. BHarucHA-ReErp, University of Oregon 

Davip BLackwELL, University of California, 
Berkele 

ISADORE Decunn:; Cornell University 

Coun Buiyta, University of Illinois 

DonaLp J. Boavs, University of Chicago 

CuaARLEs P. Bontnt, Stanford University 

J. C. G. Boor, Netherlands School of Eco- 
nomics 

Rapa A. BraDey, Florida State University 

— Brapy, University of Pennsyl- 


ania 

Dan. Britu, Board of Governors of Federal 
Reserve System 

K. A. Brown LEE, University of Chicago 

H. D. Brunk, University of Missouri 

Epwarp C. Bupp, Yale University 

Rosert J. BuEHLER, lowa State University 

D. L. BurkHoupEr, University of Illinois 

Irvine W. Burr, Purdue University 

Joun Burrrick, University of Minnesota 

Puiturp CaGcan, Brown University 

Dovatas G. CHapman, University of Wash- 
ington 

HerMAN Cuernorr, Stanford University 

KEEWHAN Cuo1, Harvard University 

Grecory C., Crow, Cornell University 

Cart F. Curist, Johns Hopkins University 

Paut C. Currrorp, Montclair Siate College 

Rospert Ciower, Northwestern University 

WiiuiaMm G. Cocuran, Harvard University 

WiuraM W. Cocurane, University of Min- 
nesola 

A. Cuirrorp' Conen, Jr., University of 
Georgia 

KautMan J. Conen, Carnegie Institute of 
Technology 

W.S. Connor, Research Triangle Institute 

R. R. Coverou, Oak Ridge National Lab- 
oratory 

Dupiey J. Cowpen, University of North 
Carolina 

Davip R. Cox, University of London 

Crcit R. Craie, University of Michigan 

Srpngey J. Curier, National Cancer Insti- 
tute 

Tore Dauentus, University of California, 
Berkeley 


CurTHBert DANIEL, New York City 

F. N. Davin, University College, London 

Herpert T. Davin, Jowa Siate University 

Rosert T. Davis, Stanford University 

Cauvert L. Deprick, United States Bureau 
of the Census 

Morris H. DeGroot, University of Cali- 
fornia, Los Angeles 

W. Epwarps Demina, New York University 

A. P. Dempster, Harvard University 

W. J. Dixon, University of California Medi- 
cal Center, Los Angeles 

Roxsert DorrMan, Harvard University 

Harowp F. Dorn, National Heart Institute 

NorMaN R. Draper, University of Wiscon- 
sin 

AcuEson J. Duncan, Johns Hopkins Uni- 
versity 

Daviw B. Duncan, University of North 
Carolina 

Merer Dwass, Northwestern University 

James S. Earvey, National Bureau of Eco- 
nomic Research 

Ricuarp A. Easteruin, Stanford Univer- 
sity 

A. o~ Ecker, Bell Telephone Laboratories, 

ne. 

CuuRCHILL ErseENHART, National Bureau of 
Standards 

Harry E1isenpress, 1BM Corporation 

Rosert M. Exvasnorr, Harvard University 

Lita Etvesack, The Public Health Re- 
search Institute of New York City 

Mary Epuina, Stanford University 

W. Duane Evans, United States Depart- 
ment of Labor 

A. V. Fenn, Stanford Research Institute 

Rosert B. Ferrer, Yale University 

— FIRESTONE, The City College of New 

ork 

R. J. Foorr, Connell and Company 

Murray F. F oss, United States Department 
of Commerce 

Louts A. Fourt, Market Research Corpora- 
tion of America 

Karu Fox, Jowa State University 

BENJAMIN FRANK, United Staies Department 
of Justice 

Lester R. FRANKEL, Audits and Surveys 
Company, Inc. 

D. A. 8. Fraser, University of Toronto 

MARSHALL FreEeiMER, Massachusetts Insti- 
tute of Technology 

Joun E. Freunp, Arizona State University 

MILTON FrigpMAN, University of Chicago 

JouN J. Gart, Johns Hopkins University 

Cuarues E. Gates, University of Minne- 
sota 

R. C. Geary, Dublin, Ireland 

SEYMOUR GEIssER, National Institutes of 
Health 

SupuisH GuuryeE, Northwestern University 

CuHARLEs Y. Guocx, University of Caltfor- 
nia, Berkeley 

R. Gnantiveusien, Bell Telephone Labora- 
tories, Inc. 

Ruts Goutp, Columbia University 


vi AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


ArtTHUR GOLDBERGER, University of Wis- 
consin 

Seima F. Goupsmitu, United States Bureau 
of the Census 

Leo A. GoopMAN, University of Chicago 

Z. GOVINDARAJULU. University of Minnesota 

Wiuson H. GraBiiu, United States Depart- 
ment of Commerce 

FRANKLIN A. GRAYBILL, 
University 

SAMUEL GREENHOUSE, National Institute of 
Mental Health 

J. AnTHUR GREENWOOD, Universily of Min- 
nesota 

Harovwp D. Grirrtin, Lincoln, Nebraska 

Zv1 Griuicues, University of Chicago 

E. J. GumBgE.t, Columbia University 

Joun GuRLAND, Iowa State University 

Rospert L. Gustarson, Michigan 
University 

Irwin GutTMaNn, McGill University 

Ropert J. Haver, University of North 
Carolina 

Wiutiram M. Haenszet, National Cancer 
Institute 

Davip HALgey, 
Scotia 

W. J. Hauy, University of North Carolina 

Max HAvperin, General Electric Company 

C. Horace Hamitton, North Carolina State 
College 

H. Leon Harter, 
Force Base 

H. O. Hartiey, Iowa State University 

Purure M. Hauser, University of Chicago 

LAURENCE Hoersst, Harvard University 

Fritz Herzoa, Michigan State University 

Bruce M. Hi, Stanford University 

Water E. Hoaptey, Armstrong 
Company 

Wassitty HogerrpinG, University of North 
Carolina 

Rosert V. Hoae, University of Iowa 

Ricuarp H. Howton, University of Cali- 
fornia, Berkeley 

Rosert Hooks, Westinghouse Electric Cor- 
poration 

A. 8. HouseHo.ipErR, Oak Ridge National 
Laboratory 

W.S. HourHakkeEr, Stanford University 

Srpney A. Jarre, United States Department 
of Labor 

GwityM JENKINS, Stanford University 

RAYMOND JESSEN, C-E-I-R, Inc. 

M. Vernon Jouns, Jr., Stanford University 

EuGENs A. Jounson, University of Minne- 
sola 

Howarp L, Jones, University of Chicago 

Dae W. JorGEenson, University of Califor- 
nia, Berkeley 

M. L. Juncosa, The RAND Corporation 

Haro.p A. Kaun, National Heart Insiitute 

Hyman B. Karrz, United States Depart- 
ment of Labor 

E. L. Kapitan, Lawrence Radiation Labora- 
tory, University of California 

Oscar KEmMPTHORNE, Jowa State University 

NATHAN Keyritz, University of Toronto 

ALLYN W. KimBa.u, Johns Hopkins Uni- 
versity 


Colorado State 


State 


Acadia University, Nova 


Wright-Patterson Air 


Cork 


BrapForpD F. KimBa.u, Public Service Com- 
mission 

Eve.tyn M. Krracawa, University of Chi- 
cago 

L. R. Kier, University of Pennsylvania 

James W. KNow .gs, Joint Economic Com- 
mittee, Congress of the United States 

Morpecuat KREININ, Michigan State Uni- 
versity 

WiiuraM H. Kruskat, University of Chicago 

Roy R. Kuesuer, Jr., University of North 
Carolina 

Greorce M. Kuznets, University of Cali- 
fornia, Berkeley 

Jack LADERMAN, Office of Naval Research 

Rospert J. LAMPMAN, University of Wiscon- 
sin 

STanLeY Lepercort, United States Bureau 
of the Budget 

E. L. Leamann, University of California, 
Berkeley 

Joseru Lev, University of the State of New 
York 

Howarp LEVENB, Columbia University 

GERALD LIEBERMAN, Stanford University 

Ta-Cuune Liv, Cornell University 

FreperRtIc M. Lorp, Educational Testing 
Service 

EvGene LvuKkacs, 
America 

ALBERT MAnpAnNsky, The RAND Corpora- 
tion 

Joun G. Maaistap, Sandia Corporation 

Lester V. MANpDERSCHEID, Michigan State 
University 

NatHan MAntTet, National Cancer Insti- 
tule 

Eu1 Marks, National Analysts, Inc. 

ALBERT W. MarsHA.., Princeton Univer- 
silty 

J. G. Mautpon, University of California, 
Berkeley 

Puiuie J. McCarruy, Cornell University 

Rosert C. McCarry, Stanford Research 
Institute 

DanrEL McFappEn, University of Minne- 


sota 
Watter W. McMauon, University of Illi- 


Catholic University of 


nois 
Paut Meter, University of Chicago 
WILLIAM MENDENHALL, Bucknell Univer- 


sity 
MARGARET 

Hampshire 
Joun E, MILHOLLAND, University of Michi- 


MERRELL, Shelburne, New 


gan 

Rupert G. MiLier, Jr., Stanford Univer- 
sity 

Feuix E. Moors, University of Michigan 

James N. Moraan, University of Michigan 

Stenit1 Moricutti, University of Tokyo 

Donatp F. Morrison, National Institutes 
of Health 

NaTHAN Morrison, New York State De- 
partment of Labor 

Lincoun E. Mosgs, Stanford University 

FrepERICK Moste.ier, Harvard University 

Joun Neter, University of Minnesota 

J. — University of California, Berke- 
ey 








EDITORIAL COLLABORATORS 


Haro.p NissELson, United States Bureau 
of the Census 

ArtuHur M. Oxun, Yale University 

Guy H. Orcutt, University of Wisconsin 

BERNARD OstTLE, Arizona State University 

JoEL OwEN, Harvard University 

Louis J. Parapiso, United States Depart- 
ment of Commerce 

EMANUEL ParzEN, Stanford University 

MELVIN P, Persakorr, The RAND Cor- 
poration 

Epwarp B. Perrin, University of Pitts- 
burgh 

WiuraM S. Peters, Arizona State Univer- 
sity 

RoceEr G. PETERSON, Oregon State College 

DANIEL O. Price, University of North Caro- 
lina 

FRANK ProscHan, Boeing Scientific Re- 
search Laboratories 

WituraM E. Pruirt, Stanford University 

RONALD Pyke, University of Washington 

Roy Rapner, University of California, 
Berkeley 

Morton 38. Rarr, United States Bureau of 
Labor Statistics 

Howarp Rarrra, Harvard University 

GEorRGE J. Resnikorr, Illinois Institute of 
Technology 

Pau R. Riper, Wright-Patterson Air Force 
Base 

JOHN RrorpDaNn, Bell Telephone Laboratories, 
Inc. 

Harry V. Roserts, University of Chicago 

Harry M. Rosensuart, Federal Aviation 
Agency 

Joan Raup Rosensiatt, National Bureau 
of Standards 

RicHarp N. Rosert, University of Roches- 
ter 

HerMan Rusin, Michigan State University 

JEROME Sacks, Cornell University 

STEPHEN M. SamvueE ts, Stanford University 

Joun D. SarGan, University of Chicago 

I. R. Savace, University of Minnesota 

L. J. Savaae, Univerity of Michigan 

Henry Scuerrh, University of California, 
Berkeley 

LoRRAINE Scuwartz, University of Cali- 
fornia, Berkeley 

EstHER SEIDEN, Michigan State University 

ALBERT SHAPERO, Stanford Research Insti- 
tute 

WituraM C. SuHetton, United States De- 
partment of Labor 

Jacos §. Srecet, United States Bureau of 
-the Census 

SIDNEY SIEGEL, Pennsylvania State Univer- 
silty 

Watt R. Simmons, United States National 
Health Survey 


vu 

MonroeG. SirkKEN, National Vital Statistics 
Division 

Morris SKINBINSKY, Purdue University 

Victor E. Smiru, Michigan State University 

Mitton SoseE., University of Minnesota 

MortTIMER SPIEGELMAN, Metropolitan Life 
Insurance Company 

R. Cuiay Sprow is, University of California, 
Los Angeles 

WixuraM A. Spurr, Stanford University 

J. H. Srap.teron, Michigan State University 

CHARLES STEIN, Stanford University 

ANDREW STERRETT, Denison University 

Grorce J. Sticter, University of Chicago 

MERVYN Stone, Princeton University 

Seymour SupMAn, Market Research Cor- 
poration of America 

DANIEL B. Suits, University of Michigan 

Rospert SumMeErs, University of Pennsyl- 
vania 

CuHaRLEs E. Swanson, The Curtis Publish- 
ing Company 

Lorie Tarsuis, Stanford Company 

Rosert F. Tate, University of Washington 

Henry Tercuer, New York University 

BENJAMIN J. TeppinG, National Analysts, 
Inc. 

GERHARD TINTNER, Jowa State University 

Joun TuKeEy, Princeton University 

CONSTANCE VANEEDEN, Michigan State Uni- 
versily 

ANDREW Vazsonyi, Ramo-Wooldridge 

D. F. Voraw, Jr., Yale University 

Davin L. WaAuuacer, University of Chicago 

Joun E. Wausu, System Development Cor- 
poration 

LIoNEL Wess, Cornell University 

ALFRED G. WuItNEY, Life Insurance Agency 
Management Association 

Ricuarp C. Wiicock, University of Illinois 

MartTINn B. Wik, Rutgers University 

8S. 8. Wixks, Princeton University 

Hans Wourr, Michigan State University 

LortnG Woop, United States Bureau of the 
Census 

Max Woopsury, New York University 

Ravpu §. Wooprurr, United States Bureau 
of the Census 

THEODORE D. Woo.seEy, United States Na- 
tional Health Survey 

Tuomas A. YancgEy, University of Illinois 

N. Donatp YivisakER, New York Univer- 
sity 

SaMuEL Zant, United States Bureau of the 
Census 

MarvIN ZELEN, National Bureau of Stand- 
ards 

ARNOLD ZELLNER, Netherlands School of 
Economics 

CaLvin Zrppin, University of California 
Medical Center, San Francisco 














JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


; CONTENTS OF VOLUME 56 


Tue 120TH ANNUAL MEETING 
SUMMARIES OF Papers DELIVERED 

ARTICLES. 

Nores aBouT AUTHORS 

Book REvIEWws . 

PUBLICATIONS RECEIVED 

CoORRIGENDA 

INDEX TO VOLUME 56, 1961 
ARTICLES, BY AUTHOR 
Book ReEvirws, BY AUTHOR 
List oF REVIEWERS . 

REPORTS AND OFFICIAL NOTES 
REPORT OF THE BOARD OF DIRECTORS 
REPORT OF THE SECRETARY-TREASURER . 


REPORT OF THE AUDITORS 


388 
. 1,223, 493, 783 


. 160, 384, 737, 1001 
. 163, 411, 740, 1006 
. 219, 471, 778, 1031 


. 1005 


. 1035 
. 1037 
. 1042 


474 
476 
479 











INDEX TO VOLUME 56, 1961 
ARTICLES 


ADELMAN, IRMA AND GRILICHES, Zv1, On An Index of Quality Change 

Ames, Epwarp AND REITER, STANLEY, Distributions of Correlation Coefficients in 
Economic Time Series . . 

ANscoMBE, F. J., Estimating a Mized-Exponential Response Lew 

AnscomBE, F. J., Rectifying Inspection of Lots 

BasMANN, R. L., A Note on the Exact Finite Sample Frequency Punctions of General- 
ized Classical Linear Estimators in Two Leading Over-Identified Cases 

Bercer, AGNEs, On Comparing Intensities of Association Between Two Binary Char- 
acteristics in Two Different Populations . ; 

BERKSON, JosEepn, The Other Side of the Lower Bound. A Note with a | Correction 

BrrnBaum, ALLAN, Confidence Curves: An Omnibus Technique for Estimation and 
Testing Statistical Hypotheses . 

BRENNA, Leroy STANLEY AND KRAMER, Cirpe Youne, Factorial Treatmenie in 
Rectangular Lattice Designs 

Brown, Murray, Ex Ante and Ex Post Data in  Iullentat'y ‘Insestment 

Burcu ey, Rosert W., A Reproducible Method of Counting Persons of Spanish Sur- 
name . : 

Caron, Jack, A Note on the Asymplotic N ernaiity of the Mann-W hitney-Wileoaon 
Statistic . 

Cox, Epwin B., C hangee in the Size Distribution of Dividend Iusome 

DaviEs, M.., Multiple Linear Regression Analysis with Adjustment for Class Differ 
ences 

Duncan, OT!s Dupuzy, Occupational Components of Educational ‘Differences in Te 
come . : ; 

Dunn, OLIVE JEAN, Multiple Comporisens Among Means ; 

Epzarn, Frep, A Parametric Estimate of the Standard Error of the Survical Rate 

Epinaton, EvGene §., Probability Table for Number of Runs of Signs of First Differ- 
ences in Ordered Series é Soe” GEM | EY oe ae. cen ae ee ae 

Epwarps, Carot B. anp GuRLAND, JoHN, A Class of Distributions Applicable to 


Accidents : 
Et-Bapry, M. A., Failure of Daumeuters to Make Entries of Sere: Errore in ‘Recor’. 
ing Childless Cases in Population Censuses . . ; 


FisHer, WALTER D., A Note on Curve Fitting with Miniuinn Deviations by Linea 
Programming 

Fisk, P. R., Estimation of L ovation end Scale Perameters ina = Truncated Grouped Sech 
Square Distribution. 

Fo.ks, Joun LEROY AND WEsT, DEL L ON, Note on ‘the Missing Plot Precdiers: in a 
Randomized Block Design . , 

FrREuND, JouN E., A Bivariate Extension of the Exponential Distribution. 

FreuND, Rupo.tr J., Vait, Ricnarp W., aNp Cuunres-Ross, C. W., Residual 
Analysis 

GALLAWAY, LOWELL E. AND > Sarre, Pavr E., A Quarterly Besnomeirie Model of the 
United States 

Go.LpBERGER, ARTHUR &., Stepwise Least Squares: Residual Analysis and Specifica- 
tion Error : 

GoLDBERGER, ARTHUR S. AND JocuEms, D. B., Note on Stepwise Least Square ; 

GoopMaNn, LEo A., Statistical Methods for the Messe Btéxes Model 

GoopMAN, Leo A. AND GRUNFELD, YEHUDA, Some Nonparametric Tests for Comete- 
ments Between Time Series ‘ 

GuMBEL, E. J., Bivariate Logistic Distributions : : 

GuPTA, Suawet S. anp Grou, Puyturs A., Gamma Distribution in ‘ Acoaplanes 
Sampling Based on Life Tests 

Gustarson, Rosert L., Partial Correlations in Regression ‘Computations 


1035 









535 
637 
493 
807 
619 


889 
670 


246 


368 
518 


88 





687 
250 







783 
52 
111 







156 





503 





909 





359 






692 





933 
971 






98 






379 





$98 
105 
841 







11 
333 






942 












1036 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


Havrerin, Max, Almost Linearly-Optimum Combination of Unbiased Estimates 

Havrerin, Max, Fitting of Straight Lines and Prediction when Both Variables Are 
Subject to Error ;. 

HansENn, Morauis H., C seueration Among Statistical oad Other Secistios ad tg 

Harter, H. Leon, The Use of Sample Ranges in Setting Exact Confidence Benda 
for the Standard Deviation of a Rectangular Population rage agg Rees * 

Hoae, Rosert V., On the Resolution of Statistical Hypotheses . 

JAEGER, CAROL M. AND PENNOCK, JEAN L., An Analysis of Consistency of Botpiees 
in Household Surveys 3 

Jounson, N. L., A Simple Theoretical Approach to Cumuletios Sum Control Charts ‘ 

JorcENSON, DaLe W., Multiple Regression Analysis of a Poisson Process ‘ 

Lapp, Georae W., On Some Measures of Food Marketing Services 

Lancaster, H. O., Significance Tests in Discrete Distributions 

LauBSCHER, Nico F., On Stabilizing the Binomial and Negative Binomial Vartanss 

LAZERWITZ, Beananp, A Comparison of Major United States Religious Groups . 

Lees, Rut W. anv Lorp, Frepreric M., A Nomograph for Computing Partial Cor- 
relation Coefficients . ‘ 

LEHMAN, SutrtEY YounG, Ezact ‘ond Appresimale Distributions for the Wilcoxon 
Statistic with Ties 

LEONE, F. C., RuTENBERG, Y. H., AND Torr, C. W., The Use sof Semple Quasi-Renges 
in Setting Confidence I stuesh for the Population Standard Deviation : 

LinpseEy, G. R., The Progress of the Score during a Baseball Game 

MANDEL, Joun, Non-Additivity in Two-Way Analysis of Variance . . 

Murray, N. M. ann Serut, V. K., Randomized sabia Multipliers in Sampling 
Theory 

NEYMAN, J. AND Scorr, E. “se Further Comments « on the “Pinal “Report of the ‘Ae 
visory Committee on Weather Control” eo ee 

Nieto DE Pascua, Jose, Unbiased Ratio Estimators in » Stratified Sengline. 

PASTERNACK, BERNARD S. AND OGawa, JuNnsrro, The Probability of Reversal Associ- 
ated with a Test Procedure when Data are Incomplete 

Preacu, Pau, Bias in Pseudo-Random Numbers 

Pratt, JoHNn W., Length of Confidence Intervals . "rr 

Preston, Lee E. anp Bge.u, Earu J., The Statistical Analysis of lading pee oe 
An Application to Food Industries 

Ras, Des, On Matching Lists by Samples. , 

Rixey, H. E., Some Aspects of Seasonality in the Conaumer Price Indes . ‘ 

Rosson, D. S. AND Virnayasal, C., Unbiased Componentwise Ratio Estimation 

Rocot, EvGenge, A Note on Measurement Errors and Detecting Real cat 

Ross, ALAN, Variance Estimates in “Optimum” Sample Designs 

Suau, 8. M., A Note on Griffin’s Paper “Graphic Computation of Tau as a 2 Coefficient 
of bicomee” ; 

Suan, 8. M., The Asympiotic Veriences of Method of Moments Estimates of the ‘Pa- 
rameters of the Truncated Binomial and Negative Binomial Distributions 

Srex.er, H. O., Forecasting Industrial Production . 

TAEUBER, Kia E., HAENSZEL, WILLIAM, AND SIRKEN, Monaos G., Reuiencs His- 
tories and Exposure Residences for the United States Population 

Tate, R. F., On the Use of Partially Ordered Observations in Measuring the Supgert 
fora Complete Order : 

Tuert, H. anp Nagar, A. L., Testing the Iudependanes of Resrossien Disturbanase 7 

THOMLINSON, Raupu, A Model for Migration Analysis . ‘ 

Tuompson, D. J. aNp Kopin, D., A Note on Follow-up for Bervioal in the Presence 
of Movement. , 

TINTNER, GERHARD, The Statistical W ork of Oskar Anderean — 

Tomasson, Ricnarp F., Bias in Estimates of the U.S. Nonwhite Populat m as Indi- 
cated by Trends in Death Rates 

Witxins, CoterincE A., A Problem Concerned with Weighting of ‘Distributions . 











INDEX TO VOLUME 56 
BOOK REVIEWS 


Acton, Forman §S., Analysis of Straight-Line Data. . . . .R.L. ANDERSON 
Axerman, Jouan, ee of Industrialism—Causal Analysis and Economic Plans . 

— AMmaARTYA KuMAR SEN 
Apna, Haney L. AND Rossa, Epwanrp R., 7 ntroduction to Probability and Sta- 


ieliew 5. . . T. A.BANcRorT 
ALEXANDER, Howaap W., Elements of Mathematical Statistics ‘Isapons BLUMEN 
AnpRIOT, JoHN L., Guide to U. S. Government Statistics . . .JoHN I. GRIFFIN 
ArMouR RESEARCH FOUNDATION, apn. of the 1959 Computer Applications 

Symposium . ae ee ee eee 
Battey, Norman T. J., ‘Statistical “Methods in Biclesy . . . Lrncoun E. Moszs 
Bartett, M. S., Stochastic Population Models in Ecology and Epidemiology . . 


‘ Norman T. J. BamLey 
Bunnion, E. G., ‘Elementary M athematice of Mneer ‘Programming and Game Theory 
‘ FRANKLIN R. Suupp 
Buacx, Duncan, The Theory of Committees ond ‘Elections . Geratp L. THompson 
Boas, Raupu P., A Primer of Real Functions . . . Fritz Herzoae 
Bop1no, Guissrrs Avonpo, A pplicazioni Beonomiche della Teoria dei Grafici 
— KENNETH J. ARROW 
Bopmo, Guiszrrs Avonno, I Precsest Stocastici in Statistica D. L. BURKHOLDER 
Bowen, Eart K., Statistics: With Applications in Management and Economics . 
: ZANE M. Po.emis 
Bowman, Epwanp H. AND . Ferree, Rosznr B., Eprrors, Analyses of Industrial 
Operations . . . . . Harvey M. WaGner 
BRAMBILLA, Francesco, Le Distribusione dei Redditi . « « « Pave G. Cramz 
Brazer, Harvey E., City Expenditures in the United States Be ig grt gt te 
Howarp G. ScHALLER 
Baunx, i. D., An I atreduction to M ethemetical "Statistics ‘ . 
‘ Gerorce E. Nicnoxson, In. 
Bausn, Rossar R. AND Eorns, Wruutam K,, Eprrors, Studies in Mathematical 
Learning Theory . . . . J. Laure SNELL 
CENTRAL BUREAU OF Sravwrice, H istoriek Statistik for Sverige 
CHANDRASEKHAR, S8., China’s Population, Census and Vital Statistics. yy 
Joun 8. Arrp 
Cununsar, ‘Hous B. AND Cuanx, Pau. G., Satertndusiry Economics - 
O. H. Brown.ss 


Cuuncuuan, Cc. Wesr, Prediction ond Optimal Desision .  Nicnouas M. Sirs 
Coaan, Epwarp J., NorMAN, Rosert Z., anp THoMPsON, GERALD L., Calculus of 

Functions of One Argument... . « « « Me.cuer P. Fosss 
CopEeLaAND, Morais A., Trends in Griniainent ore . . Kennetsa D. Roose 


CroxToN, FREDERICK E. AND CowpbEN, Dup.ey J., Practical Business Statistics 
Ricuarp C. HENsHAW, JR. 
Daan Pusan, ‘teint, Bibliography on 5 Senna and Wealth, Volume VII, 1955-56 
Desrev, Gerarp, Theory of Value: An Axiomatic Analysis of Economic Equilibrium 
. Roserr H. Srrorz 


Demine, W. Epwarps, Sample Design in Business Research . . . Lesiie Kisu 

Duncan, Oris Dupiey, Cuzzort, Ray P., anp Duncan, Brverty, Statistical 
Geography: Problems in Analyzing Areal Data. . . . . . Lusi Kise 

ErreMAN, WIitForp J., Price Determination in Oligopolistic and Menipeltaie Situa- 
ations .  Wrtvarp Sparks 

ELDRIDGE, Hoss Tr, “The Materials of Dimigetglie: re Selected and Annotated Bibli- 
ography . . ; ‘ 

Enricx, NORBERT om Quality Control, Fourth Edition ae, W. esae hoon 


Eve y, R. anp Lier, I. M. D., Concentration in British Industry . ee 
NS SES We a he ae InMA ADELMAN 


1037 


414 
1008 
184 
1024 
749 


468 
179 


1015 
210 
176 
214 


760 
759 


188 


175 
750 


204 


412 


167 
468 


443 


424 
1013 


467 
753 


194 
468 


422 
740 


1011 


453 


217 
771 














1038 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


EzexieLt, Morprecal AND Fox, Karu A., Methods of Correlation and Regression 
Analysis. Linear and Curvilinear. Third Edition . . . . RoGer MILLER 
Ferauson, Georce A., Statistical Analysis in Psychology and Education 
al Caer Ae a BENJAMIN Rosner 
FirEsTONE, JOHN M., Federal Receipts and ' Bapenditures ‘During Business Cycles, 
1879-1968 . . . . Ortro EcksTEINn 
FisHER, IRVING Norton, A Baliegranhe of the Writings of Irving Fisher 
Forster, Francis M., Epiror, Evaluation of Drug Therapy jf 
Forsytu, ANDREW R., Theory of Differential — Three Vebsius ‘ , 
ss PavuL Brock 
Fasnon AN, Rous ALD, . Wanurron, Peaean K,, AND Caupsaut, Artuur A., Family 
Pleaning, Sterility, and Population Growth oe ee Hants F. Doan 
FREUND, JOHN E., Modern Elementary Statistics, Second Edition . . Ray Hyman 
FRIEND, IRWIN AND JONES, Rospert, Epitors, Proceedings of the Conference on Con- 
sumption and Saving, Volumes I and II . . » . . E. Scorr Mayrngs 
GauteE, Davip, The Theory of Linear Economic Models . . Date W. JoRGENSON 
Garvy, GreorGE, Debits and Clearings Statistics and Their Use . E.T. WEILER 
Gerra, M. J., The Demand, cies and Price Structure of Eggs . 
AntuHony P. STEMBERGER 
GERRISH, F,, Pure Methematics ree a ae ee Paul MEYER 
GINSBURG, Noarox, Atlas of Economic Desslegment i 
GoLpsTEIN, SIDNEY, Consumption Patterns of the Aged. Study of Consumer Expend- 
itures, Incomes, and Savings in the United States . . . JANET A. FISHER 
Grant, EvGene L. anp IrEson, W. Grant, Principles of Engineering Economy. 
Fourth Edition. . . . . . & B. Lervrauze 
Grecory, R. H. anp Van Horn, R. Mt Autometic Data- Processing Systems: Princi- 
ples and Procedures . . et ale DANIEL regceral 
Guest, P. G., Numerical Methods of Curse Fitting . 
Haper, Witit1aM, McKean, EvGene C., anv TayYLor, Haroup C, “The et 
Economy: Its Potential and Its Prsblonn. Trek Ae oe Wanenar E. STRINER 
Hatmos, Paut R., Naive Set Theory . . Se ie Sipney G. WINTER, JR. 
HANNA, FRANK A., The Compilation of Manufacturing Statistics . MicHae, Gort 
HANSEN, BertraANp L., Work Sampling for Modern Management 
i ae aay . FRANK J. Wiiiass 
HaRMAN, HARRY H., Meders Foun Analysie “2d ania Donaup F. Morrison 
Hauser, Puiuip M. anp Duncan, Ottis Dubey, Eprrors, The Study of Population: 
An Inventory and Appraisal . . Gerorce F. Marr 
HEGELAND, Hueco, Eprror, Money, Growth, ‘end Methodology: In Honor of Johan 
dawoen anceerid. sa a> ae aes 
Hii, A. BRapForp, Eprror, Controlled Clinical Trials . . Donatp MAINLAND 
HiroTsuBasH! UNIvEersity, Annotated Economic Statistics of Japan for Postwar 
Years Upto1958 . . . . . . M. BRonFENBRENNER 
Hocuwa p, H., Srriner, H. E., anp SoNENBLUD M, 8., Local Impact of Foreign Trade, 
A Study of Methods of Lewl Economic Accounting. A Staff Report 
: mee O. Canrzn 
Howarp, Ron ALD 38 Dynamic Programming and Markov Processes . 
“pe . MARSHALL Fasrunr 
INDUSTRIAL Sranisric 8 Commrrran OF » Easruan Kopak Company, Symbols, Defini- 
tions and Tables for Industrial Statistics and Quality Control . : 
. GEORGE J. Resnixorr 
INTERNATIONAL L ABOUR s Oreice, Futernational Migration: 1945-1957 
a . Ricwarp A. EAsTER.in 
INTERN ATIONAL Lasov R : Orrice, Yearbook of Labour Statistics 
Isarp, WALTER, Methods of Regional Analysis . . . MICHAEL . BRENNAN 
JOHNSON, PaLMER O. AND Rao, Munamartr S., Modern Sampling Methods: Theory, 
Experimentation, Applications . . . . . . . « Harotp NissELson 





192 
185 
427 
776 
775 
466 


429 
446 


1018 


175 











INDEX TO VOLUME 56 


Kaun, C. Harry, Personal Deductions in the Federal Income Tax Amotz MoraG 
KemMEnNy, Joun G. AND SNELL, J. Laurie, Finite Markov Chains . i 
GLEN E. Baxran 
Kuinrentnn, AL Y,, Mathematical Methods in wihe Theor of Queueing ; 
pobengaa oe Ws iaher Oi iy Se 2. aia Emanuzt Pansen 
Kisu, Georax, Economic Atlas of the Soviet Union . » Gite Rogie 
Koze.xa, R. M., Elements of Statistical Inference . . . . . PETER ZEHNA 
KRISTENSEN, Tnonutt AND AssociaATEs, The Economic World Balance 
IRMA Apatsan 
Lanna, Kanto, The Meney Supply, Money Flows ond Domestic Product in Finland, 
1910-1956 . . . . . . . Kart Brunner 
Leg, Y. W., Statistical Theory of Communsioation ae eee. AMIEL FEINSTIN 
LEFEBER, L., Allocation in Space: Production, Transport and Industrial Location 
WES AR i cae lao tigt aioe . JAMES N. Boues 
LEHMANN, E. L., Testing Statistical Hypotheses .  . ‘ Joun W. Pratt 
Lersy, James, Carroll Wright and Labor Reform: The Origin of Leber Statistics 
. Mark PERLMAN 
Le Roy, Hanat Loum, Statistische Methoden pw Populationsgenstili . sie ae 
, NEwTOoN E. Morton 
Levy, Morwan: E. P Panis Taz Boomptions. = eg . .  Amotrz MoraGe 
Li, C. C., Numbers from Experiments: A Basic Analyeis of Variation 
er ee a Os ere Lee ee : Cant E. Horains 
LICHTENBERG, Rosert M., One-Tenth of a Nation: National Forces in the Economic 
Growth of the New York Region . . . . . Donaup L. Fouey 
LICHTENBERG, Rosert M., The Role of Middlen man ‘Transactions in World Trade 
a aes . JOHN B. HENDERSON 
LIEBERMAN, G ERALD “ AND > Ow EN, Don. ALD B., Tables of the Hypergeometric Prob- 
ability Distribution . sel ¥egt ey > a tly gees eee 
LINDGREN, B. W. anv McExaats, G. W., Introduction to Probability and Statistics 
Harry WEINGARTEN 
Lister, Lours, Europe 8 Coal ond Steel Community: An Experiment in Economic 
ae . « H. Lupeun 
LonG, CLARENCE D., Wi ages 2 ond Barnings in the United States 1860-1890 
Bee ye vy RL a a . Seymour L. WoLFBEIN 
Lovepbay, R., A Second Course in Statistics . . . . Geratp J. LIEBERMAN 
Luceg, R. Duncan, Individual Choice Behavior: A Theoretical Analysis 
ae . PaTrRIcK Surrns 
Mack, ‘Siower F,, Blomentary Statistics peti . . . JuLEs Joskow 
MALZBERG, Bunsautw, The Alcoholic Poycheoss: Demographic Aspects at Mid-Cen- 
tury in New York State 
Mamorita, C. B., Population and Family Plonning in India pices. Laem Ss 
. Haroup F. GouipsmitH 


Manceou, Son, The Scientist in A suavletin Tadutry . . . Cart F. Kossack 
Mark, Mary Louise, Statistics in the Making: A Primer in Statistical Survey 

MeéRe@ . . - «© 6 « « «+ Jdonun M. Fiazstone 
Marris, Rosin, Boonemis Arithmetic. my Vs . .  Joun P. HENDERSON 


Meyer, Joun R., Peck, Merton J., STENASON, . Joun, AND Zwick, CHARLES, The 

Economics a Competition in the Treaepertetion Industries : 
. HERBERT D. Mounine 
MiLuzr, Ken: NETH s., An Introduction to the Caleulus of Finite Differences and Differ- 


ence Equations . . ‘ . . Gorpon E. Larra 
Moore, WI.Bert E. anp Fauuar, Amnoup S., Eprrors, Labor Commitment and 
Social Change in Developing Areas . . . . . . . . Evepne STALEY 
Moran, P. A. P., The Theory of Storage. . . .  . Herpert Scarr 


Murpay, Doveas P. anp ABBEY, HELEN, Cancer in Families. Ai Study of the Rela- 
tives of 200 Breast Cancer Probands . . . . . . . . J. YERUSHALMY 


1039 
751 
182 
744 
469 

1025 
757 


454 
417 


202 
163 


770 


760 
751 


180 
463 
212 
777 
187 
448 


174 
1025 


172 
190 


469 


441 
455 


189 
456 
458 
466 


764 
411 


440 











1040 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1961 


NationaL Bureau or Economic Resgarcu, An Appraisal of the 1950 Census In- 
come Data. Studies in Income and Wealth . . . . . .Haroup LyDaLu 
NaTIoNAL Bureau or Economic Resgarcu, Demographic and Economic Change in 
Developed Countries . . . . . ALEXANDER GERSCHENKRON 
NATIONAL BuREAU oF EcoNomICc Ruszaace, The Quality and Economic Significance 
of Anticipations Data . . . . . .  .RicHarpD R. NELSON 
NATIONAL BureEAv or Economic Reseance, Towards a Firmer Basis of Economic 
Policy 
NATIONAL BUREAU OF Ec ONOMIC Resmancs, ‘Frends in the Amertean Beenomy é in n the 
Nineteenth Century 4 waa . . .  . Aurrep H. Conrap 
NATIONAL EDUCATION Association OF THE " Unrrep States, Teacher Supply and 
Demand in the Public Schools, 1961 ‘ ; i 
Nippitcu, P. H., Elementary Logic of Science and Methematics , STEVEN OrEy 
Nrppitcu, P. H., Introductory Formal Logic of Mathematics . . .STEVEN OREY 
Nrxon, J. W., A eerie of the International Statistical Institute, 1885-1960. . . 
mk « . GerTRUDE M. Cox 
Oncamsation FOR Evnorsan Econouc Coorzration, Statistics of Sources and 
Uses of Finances, 1948-58. oe etl 
ParKER, WILLIAM V. AND EAVES, JAMES Cc. a Metrias wae SS Ricuanp v. kev ANS 
ParzEN, EMANUEL, Modern Probability Theory and Its Applications . 
J. Lavniz SNELL 
Prias, K. C. ‘S., ‘Statistical Tables fer Tests of Multiecriate Hypotheses 
PLacketTt, R. L. Principles of Regression Analysis ‘ WiiuraM G. Coonan 
Purce..t, THEODORE V., Blue Collar Man: Patterns of Dual Allegiance in Industry 
MartTIN PaTCHEN 
Quaczansusz, G. G. AND Suarrun, J. D. Collecting Food Purchase Data by Con- 


sumer 6 ‘ . Francis WALKER 
QUENOUILLE, M. | The A naiyels of Multiple Time-Series .  T. W. ANDERSON 
RALSTON, = ath AND Wir, Hersert S., Eprrors, Mathematical Methods for 
Digital Computers . . ss »« » oe eee 
Ray, WituraM §., An Introduction to Baperimentel Design. . . . Martin Fox 


RICHARDSON, Luwis F., Statistics of Deadly Quarrels 
RicHarpson, Lewis F., Arms and Insecurity . 
Rosertson, D. J., Pastery Wage Structures and National Agreements 
MILTON Dznsze 
Rossnson, Expans A, An Tatooduation to Infinitely Mang Variates . 
—— EMANUEL PanzEn 
Rosison, Rotanp 2 Postw ar + Market for State ond Local Government Securities 
Davin A. BAERNCOPF 


Rossnrnat, Annas J., 'E peron-in-Cumme, Quality Control Yearbook, 1960. . 
Roy, Renf, Eprtor, Cahiers du Séminaire d’ Econométrie: No. 5, Production inves- 
tissements et productivilé . . . . . Ronatp I. McKinnon 
RuNNENBURG, J. Tu., On the Use of Markee Processes in One-Server Waiting-Time 
Problems and Renewal Theory. . . .  Aveustus J. FaBens 
Sauter, W. E. G., Productivity and Technical Change . .  . Gunn L. JoHNsoNn 


SANGREN, WARD C,, Digital Computers and Nuclear Reactor Calculations 
o MSE Buanvcua-Reww 


ScHEetumne, Tuomas C., The Strategy of Conflict. . . . . ANatToL Rapoport 
ScHMECKEBIER, LAURENCE F. AND Arye, Roy B., Government Publications and 
Their Use . . . . Harry VENNEMAN 
SEvVALDSON, PER, Inpue-Oureur Ananvazs OF Nonwaeian INDUSTRIES, 1954 . 
Sirx, Leonarp S., The Research Revolution . . . . . Wrtt1amM M. Capron 
SLICHTER, Sumner H., Heaty, James J., anp LivernasH, E. Ropert, The Impact 
of Collective Bargaining on Management. . . . . . Mutton DERBER 


Stonrmm, Morris Jamzes, Sampling ina Nutshell . . . . Harouip NIssELSON 


169 


1006 


423 


1027 


752 


. 1028 


1023 
1023 


416 


469 
213 


413 
776 
418 
462 


1021 
419 


179 
183 
469 
469 
766 
465 


451 
1028 


201 


745 
1010 


444 
433 


749 
777 
768 


765 
193 














INDEX TO VOLUME 56 104i 


| 
Smith, C. Frank anp Leaso, D. A., Basic Statistics for Business Economics . 
Epwin B. Cox 774 


Santon, Ratpu G , Numerical Methods for Sciénes ond Engineering , « 776 
STATISTICAL Orrics, Unirep Nations, Patterns of Industrial Growth 1938- 1958 . 216 
SratisticaAL Orrice, Unirep Nations, Yearbook of International Trade Statistics 

217 
Strong, RIcHARD AND > Cnorr-Mumnar, Grov: ANNA, Social Accounting ond Beonentie 

a ae to H.S. HourHakKker 205 
Survey RESEARCH Cunrse, Univ ERSITY OF Micmean, Economie Survey Data . 1029 
Taxics, Lasos, Stochastic Processes: Problems and Solutions . . . Leo Katz 1024 
TueiL, H., Contributions to Economic eer Economic Forecasts and Policy. Sec- 

ond Revined Edition . . . 1028 
THORNDIKE, Ropert L. AND Haczn, Evizasern, Mensurement end Evaluation in 

Psychoiogy and Education. . - . 1029 


Toutey, G. 8. anp Riags, F. E., Eprrons, Bitdemice of Watershed Planning ; 
ce eee sa vm jie ee eee ee eae iat ake en ae ae . J. W. Mritumman 769 
TorGEeRSON, WARREN 8., Theory and Methods of Scaling . . JosEpH L. Zinnes 430 

TRIMBLE, H. C. anv Lort, FrEp W., JR., we a Analysis: A Modern Approach 
; Howarp E. CAMPBELL 467 

Unrrep Narrows, Conrsactina Panties TO THE Gunzea AGREEMENT ON TAR- 


RIFFS AND TRADE, International Trade 1959 . . . . +. O.J. Firestone 755 
Unirep Nations, DEPARTMENT OF Economic AND Socrau Arrarrs, The Population 

of Asia and the Far East, 1950-1980 . . . . . Haroip F. Dorn 439 
Unitev Nations, DEPARTMENT OF ECONOMIC AND > Soctan Arrarrs, Yearbook of 

National Accounts Statistics . . Baud - 1027 
Unitep Nations, Economic Commnssron FOR Evnors, Government Policies end the 

Cost of Building . . . é . Rogpinson NeEwcoms 449 
Unitep Nations, Foop AND Aqnicuvrunat Oncantsation, Handbook on Data 

Processing Methods: Part I. Provisional Edition . . . CaruF. Kossack 443 


Unirep States BurEAU OF THE CENSUS WITH THE COOPERATION OF THE SOCIAL 
ScrencE RESEARCH Ft Historical Statistics of the United States: Colonial 
Timeto1967 . . . . . Lance E. Davis 747 

UniTep STaTEs DEPARTMENT ¢ OF Commmance, Buasiv OF THE Cunsvs, The Post- 
Enumeration Survey: 1950. An Evaluation Study of the 1950 Census of Popula- 


Nae GOR TORE: . ws coco 2 cca te ches «cies thdiseSiccsndntaar aaa 470 
Unitep States DEPARTMENT OF COMMERCE, BUREAU OF THE CENSUS, Statistical 

Ma aE Te EE Gn FOE oe 8 cer Sisiviisitesteannsa rhetena sae 215 
Unitep States DEPARTMENT OF HEALTH, EDUCATION, AND WELFARE, SociaAL 

Security ADMINISTRATION, Basic Readings in Social Security............... 216 
Unitep States DEPARTMENT OF LaBor, Bureau or LaBor Statistics, Population 

and Labor Force Projections in the United States, 1960-1975 .............244. 216 
Unitep States TREASURY DEPARTMENT, INTERNAL REVENUE SERVICE, Siatistics of 

Income . . . 1957-68, Corporation Income Tax Returns with Accounting Periods 

Ended Say 1957-June 1968 . . . . « Danret M. Houtanp 207 


Vaspa, 8., An Introduction to Linear Pitgemiiing ond the Theory of Games 

ios. és, SS eee ee ee, Je Rautrw E. Gomory 761 
Warntz, WiLu1AM, Toward a Geography of Price . . . . Gregory C. CHow 209 
Warrat, P. K., Population Problem in India, A Census Study J. ALLAN BEEGLE 442 
WEIBULL, CuristER, Some Aspecis of Statistical Inference with Applications to Sam- 


ple Survey Theory . . ; . .Leonarp J. Savace 746 
WEISSBERG, ALFRED AND Buarry, Gu ENN H., Tables of Tolerance-Limit Factors for 

Normal Distributions . . 206 
Witurams, Ernest W., Freight Transportation 4 in the Soviet U nion: A Couparizen 

ink oat hae at eat Ee ee 215 


Youpun, W. J., Siatsetical Design». ww wt lt lt ls K. A. Brown.zs 421 








Irma Adelman . 

John 8. Aird 

R. L. Anderson. 

T. W. Anderson 
Kenneth J. Arrow . 
David A. Baerncopf 
Norman T. J. Bailey . 
T. A. Bancroft . 

Glen E. Baxter . 

J. Allan Beegle. .. 
A. T. Bharucha-Reid . 
Isadore Blumen 

James N. Boles. 
Michael J. Brennan 
Paul Brock . 

M. Bronfenbrenner 

K. A. Brownlee 

O. H. Brownlee 

Karl Brunner . .. . 
Donald L. Burkholder. 
Howard E. Campbell . 
William M. Capron 

H. O. Carter 

Gregory C. Chow . 
Paul G. Clark . 
William G. Cochran 
Alfred H. Conrad . 
Edwin B. Cox . 
Gertrude M. Cox 
Lance E. Davis 
Milton Derber . 
Harold F. Dorn 
Richard A. Easterlin 
Otto Eckstein 

Richard V. Evans . 
Augustus J. Fabens 
Amiel Feinstein 

John M. Firestone . 

O. J. Firestone . 

Janet A. Fisher. 
Melcher P. Fobes . 
Donald L. Foley 
Martin Fox . 

Marshall Freimer . 


Alexander Gerschenkron . 


Harold F. Goldsmith . 
Ralph E. Gomory . 
Michael Gort 

John I. Griffin. . 
John B. Henderson 
John P. Henderson 


Richard C. Henshaw, Jr. 


Fritz Herzog 


LIST OF REVIEWERS 


757 
443 
414 
419 
760 
. 451 
- 1015 
184 
182 
442 
444 


. 765, 766 
. 429, 439 


427 
213 
745 
417 
189 
755 
438 
467 
463 
183 
. 411 
. 1006 
441 
761 
198 
749 
212 
456 
194 
214 


Daniel M. Holland 
Carl E. Hopkins 

H. S. Houthakker . 
Ray Hyman 

W. Grant Ireson 
Glenn L. Johnson . 
Dale W. Jorgenson 
Jules Joskow 

Leo Katz. 

Leslie Kish . 

Carl F. Kossack 
Gordon E. Latta 
Gerald J. Lieberman 
8. B. Littauer 
ee” ee 
Harold Lydall . 
Donald Mainland . 
George F. Mair 

E. Scott Maynes ‘ 
Ronald I. McKinnon . 
Paul Meyer . 

Roger Miller 

J. W. Milliman 
Herbert D. Mohring 
Amotz Morag . 
Donald F. Morrison 
Newton E. Morton 
Lincoln E. Moses . 
Richard R. Nelson 
Robinson Newcomb 


George E. Nicholson, Jr. . 


Haroid Nisselson 
Steven Orey 

Emanuel Parzen 
Martin Patchen 

Mark Perlman . 

C. L. Perry . 

Zane M. Polemis 

John W. Pratt . 
Anatol Rapoport 
George J. Resnikoff 
Kenneth D. Roose . 
Benjamin Rosner . 
Leonard J. Savage . 
Herbert Scarf , 
Howard G. Schaller 
Amartya Kumar Sen . 
Franklin R. Shupp. 
Nicholas M. Smith 

J. Laurie Snell . 
Willard Sparks . 
Eugene Staley... 
Anthony P. Stemberge 


1042 





. 465, 744 


207 
180 
205 
446 

. san 
- 1010 
426 
190 

-. . 1024 
. 740, 1011 
. 443, 455 
466 

- 1025 
203 
448 
169 

. 758 
. 1016 
. 1018 
201 
771 
192 
769 
458 
751 
444 
760 
179 
423 
449 
412 


. 193, 195 


- 1023 


462 
770 
179 
188 
163 
433 
448 
753 
185 
746 
411 
. 204 
- 1008 
210 
- 1013 


. 167, 413 


453 
764 
167 











LIST OF REVIEWERS 


Herbert E.Striner. . . . . . 461 
Robert H.@trots ...... 4 
Patrick Guppes. . . =... « Eee 
Daniel Teichroew . . . . . . 206 
Gerald L. Thompson . . . . . 176 
Harry Venneman . ... . . 749 
Harvey M. Wagner... . . 175 
Francis Walker ea ee Oe See 


E. T. Weiler 
Harry Weingarten . 
Frank J. Williams . 


Sidney G. Winter, Jr. . 
Seymour L. Wolfbein . 


J. Yerushalmy . 
Peter Zehna 
Joseph L. Zinnes 


1043 


175 
187 
460 


. 1022 


174 
440 


. 1025 


430 











WE 
O 
S\N 





PROBLEM-SOLVING SIMPLIFIED 


Here is a quality-in-depth data-processing 
and computer service that simplifies 
problem-solving for researchers and 
statisticians. It gives you the benefit of the 
long experience of the oldest and largest 
independent data-processing service bureau. 
It applies the most advanced methods and the 
latest electronic computer and data- 
processing facilities. It combines the skills of 
statisticians, mathematicians, methods 
engineers and programming specialists to 
find the better way. 


Just call our nearest office for a free analysis 
an? cost estimate to solve your problem. 


sic s{c uma 1933 


ey, 


TABULATING CORPORATION 


NATIONAL HEADQUARTERS: 


104 South Michigan Avenue * Chicago 3, Illinois 


OFFICES IN PRINCIPAL CITIES—COAST TO COAST 


STATISTICAL MARK OF EXCELLENCE 


Please mention the Journal of the Amenican Statistica, Association in writing advertisers 










Mathematical Statistics 





by John E. Freund, Arizona State University 


. +. a modern treatment of mathematical statistics presenting a 
carefully designed balance between theory and application. 


Trade price: $10.00* 














January, 1962 approx. 400 pp. 





Modern Elementary Statistics, 2nd ed. 
by John E. Freund 





... a revised and up-dated edition having the same general scope 
and basic objectives as its predecessor—modern statistics on the 
beginner’s level, for students in all fields. 


1960 413 pp. Trade price: $10.00* 








A Manual of Experimental Statistics 





by John E. Freund, Paul E. Livermore, and 
Irwin Miller, all of Arizona State University 





. +. presents, in outline form, the most frequently used statistical 
techniques, including appropriate computing formulas and com- 
pletely worked out examples of each method. 


1960 132 pp. Trade price: $5.25* 





Modern Business Statistics 


by John E. Freund and Frank J. Williams, 
San Francisco State College 










. a skillful blending of the traditional subject matter of de- 
scriptive statistics with the newer ideas of statistical inference. 


1958 539 pp. Trade price: $10.35* 


Theory of Markov Processes 


by Evgeni B. Dynkin; translated by D. E. Brown; 
edited by T. Kévary 




















. » » the first text on the theory of Markov processes to cover 
both general state spaces and general transition functions. 


1961 210 pp. Trade price: $11.95* 







* Educational discount for classroom use. 





Prentice-Hall, Inc. 
BOX 903, ENGLEWOOD CLIFFS, NEW JERSEY 


Please mention the Journal of the Amentcan Statistica, Associarion in writing advertisers 








- + NEW from ADDISON-WESLEY ::- 


HANDBOOK OF STATISTICAL TABLES 
By Donald B. Owen, Sandia Corporation 


An unusually complete and up-to-date collection of tables of functions used in statistics. 
Many tables are reproduced from computers photographically to minimize the possibility 
of error. Some 100 different tables are included. c. 576 pp, 1962—$10.00 


INDEX OF MATHEMATICAL TABLES—Second Edition 

By A. Fletcher and J. C. P. Miller, L. Rosenhead and L. J. Comrie 

Scientific Computing Service, Ltd. 

Indexes the contents of all important mathematical tables published since Briggs’ Logarith- 
morum Chilias Prima (1617). Part I arranged according to mathematical function; Part II 
lists several thousand references to important tables, some unpublished; Part III lists known 
errors in published tables. 2 Vols. Dec. 1961—approx. $42.00 


PROBABILITY WITH STATISTICAL APPLICATIONS 

By Frederick Mosteller, Harvard University; Robert E. K. Rourke, Kent School 
George B. Thomas, Jr., Massachusetts Institute of Technology 

This new text contains a number of distinctive features: 


e a brief course in elementary probability theory for finite sample spaces 

e one of few available elementary introductions to random variables, their distribu- 
tions and properties of their distributions 

e introduction of each new concept through examples 


478 pp, 70 illus, 1961—$6.50 
ORDER FROM YOUR BOOKSTORE OR FROM— 


A . 
vy ADDISON WESIEY PUBLISHING COMPANY, INC. 


5 uth Stree Reading, Massachusetts 





D IMPORTANT STUDIES FROM MACMILLAN 


STATISTICS IN PSYCHOLOGICAL RESEARCH 
William S. Ray, The Woman's College of the University of North Carolina 
This new text is designed to develop the concept of psychological statistics 
as an important research vehicle, and not merely as a set of procedures for 
evaluating research data. The author treats psychological statistics as the 
language of the research psychologist, emphasizing the distinction between 
syntactical and semantical issues, descriptive and inferential statistics, ex- 
perimental and non-experimental research, and between testing significance 
and estimating parameters. January, 1962 


‘ 


STATISTICAL THEORY 
B. W. Lindgren, Institute of Technology, University of Minnesota 

Here is a thorough explanation of the modern theory of statistics based on 
a firm calculus foundation. The book includes comprehensive coverage of 
sufficiency, minimal sufficiency, exponential families, Cramer-Rao in- 
equality and efficiency, decision theory, Bayes procedures, and Fisher- 
Yates and vanderWaerdan comparison tests. A reader's report describes 
this text as “...a real contribution to the pedagogy of statistics.” 

April, 1692 


The Macmillan Company A Division of The Crowell-Collier Publishing Company 


Please mention the Journal of the AMenican Statistica Association in writing advertisers 











For Students of This text provides students of economics and 
Economics business with sufficient knowledge to apply 
and Business statistics, to understand their power and use- 
fulness, and to realize their limitations. The 
book accomplishes this by emphasizing sta- 
tistical concepts, statistical inference, and the 
pervasiveness of sampling. The concise writ- 
ELEMENTS ing breaks through the mathematical “lan- 
guage barrier” and makes the material readily 

understandable to those with a minimal 

OF mathematical background. Sufficient material 

is included to lay a solid foundation for more 


MODERN advanced courses, however. 
“An unusually good elementary text. It is 
STATISTICS well-organized and has a good elementary 
treatment of statistical theory without an ex- 
cess of descriptive materials.” Roger L. Bur- 


ford, Georgia State College of Business Ad- 
By Boyd L. Nelson ministration. 


University of Maryland 366 pages illustrated $5.00 














APPLETON-CENTURY-CROFTS, INC. 
34 West 33rd Street, New York 1 











Important College . Texts 


STATISTICAL REASONING IN SOCIOLOGY. John H. Mueller and 
Karl F. Schuessler. Employs common-sense terms to make the funda- 
mentals of statistical reasoning plausible and comprehensible. An in- 
structor’s key provides answers to all the exercises. 442 pages 1961 
$6.25 


TYPES OF FORMALIZATION IN SMALL-GROUP RESEARCH. 
Joseph Berger, Bernard P. Cohen, J. Laurie Snell, and Morris Zelditch, 
Jr. The authors examine the mathematical models that have been con- 
structed for small-group behavior and find that there are three distinct 
types. Each of these types is illustrated at length, and its uses and 
limitations are explored. The findings will be of interest to model- 
builders in all the behavioral sciences. Coming early in 1962. 





Houghton Mifflin 

















Please mention the Journal of the Amentcan Stavtsticat Association in writing advertisers 











(Princeton 


UNIVERSITY 
PRESS 





Guide to Tabies in Mathematical 
Statistics 


By J. Arthur Greenwood and H. O. Hartley 


This Guide catalogues a large selection of 
tables belonging to the field of mathematical 
statistics, and a small selection of mathe- 
matical tables lying outside statistics but 
often used together with statistical tables. 
The bulk of the tables treated were pub- 
lished between 1900 and 1954; occasional 
entries relate to works as early as 1799 and 
as late as 1960. As well as filling an im- 
portant need for those actively engaged in 
the computational side of mathematical 
statistics, this work offers valuable reference 
to the professional computer faced with a 
statistical problem, and the statistician 
called upon to compute, Published for the 
National Research Council and the National 
Academy of Sciences. 1076 pages. $8.50 





Capital in the American 
Economy: Its Formation 
and Financing 
By Simon Kuznets 


National Bureau of Economic Research 
Capital Formation and Financing, No. 9 
n examination of long-term trends in 
capital formation and ncing in the 
United States, this study is organized pri- 
marily around the principal capital-using 
sectors of the economy: agriculture, mining 
and manufacturing, public utilities, non- 
farm residential real estate, and govern- 
ment. The analysis summarizes major trends 
in real capital formation and financing, and 
the factors that determined the trends. The 
significance of these factors for the future is 
studied. The book concludes and sum- 
marizes the findings of the National Bureau 
of Economic Research’s study of capital for- 
mation and its financing, begun in 1950. 
650 pages. $12.00 


Order from your bookstore, or 


PRINCETON UNIVERSITY PRESS, Princeton, New Jersey 

















ANALYTICAL STATISTICIAN—Robert A. Taft Sanitary Engi- 


neering Center, U. S. Public Health Service, Cincinnati, Ohio. Recent 


advanced training, and experience including the application of com- 


plex statistical methods and high speed data processing techniques. 


Knowledge in the area of Air Quality research is desired but not re- 


quired. This position offers an excellent opportunity for professional 


development. Starting salary $8,955-$10,635 per year. Civil Service 


benefits and requirements apply. Contact Personnel Office, TRinity 


1-1820. 








Please mention the Journal of the Amxnican Statistica, Association in writing advertisers 














The Annals of Mathematical Statistics 


TABLE OF CONTENTS 
Vol. 32, No. 4—December, 1961 
Contents 
An Approach to Time Series Analysis Emanuel Parzen 


Some Model I Problems of Selection .....0....ccccccecceccscscseess E. L. Lehmann 


Bayes Rules for a Common Multiple Comparisons Problem and Related Student-t 
Problems D B. Duncan 


The Use of Least Favorable Distributions in Testing Composite Hypotheses 
H. E. Reinhardt 


Asymptotic Efficiency in Polynomial Estimation Paul G. Hoel 
Maximum Likelihood Estimation of a Linear Functional Relationship C. Villegas 
Sequential x?- and T?-Tests Edward J. Jackson and Ralph A. Bradley 


Estimating the Parameters of Negative Exponential Populations from One or Two 
Order Statistics H. Leon Harter 


On the Two Sample Problem: A Heuristic Method for Constructing Tests ..V. P. Godambe 
A Nonparametric Test for the Problem of Several Samples V. P. Bhapkar 
Distribution of the Anderson-Darling Statistic Peter A. W. Lewis 
aS EF TN a oo 0.55 Se SO bcc cd ick actos ¥ER 6%. cb sees cbbveetnce S. John 


On the Monotonic Character of the Power Functions of Two Multivariate Tests . . 
. N. Roy and W. F. Mikbail 


The Moments of Elementary Symmetric Functions of the Roots of a Matrix in Multi- 
variate Analysis Tito A. Mijares 


Variance Components in the Unbalanced 2-Way Nested Classification ...... S. R. Searle 


Some Main-Effect Plans and Orthogonal Arrays of Strength Two 
Sidney Addelman and Oscar Kempthorne 


On A Geometrical Method of Construction of Partially Balanced Designs with Two 
Associate Classes Esther Seiden 


On Some Methods of Construction of Partially Balanced Arrays ...... I, M. Chakravarti 
Some Further Designs of Type O:PP G. H. Freeman 
Sufficiency in the Undominated Case 

On a Special Class of Recurrent Events M. P. Schitzenberger 
Maximum Likelihood Characterization of Distributions Henry Teicher 
On the Distribution of First Significant Digits Roger S. Pinkham 
Markov Renewal Processes: Definitions and Preliminary Properties Ronald Pyke 
Markov Renewal Processes with Finitely Many States Ronald Pyke 


A Convexity Property in the Theory of Random Variables Defined on a Finite Markov 
h H. D. Miller 


Limit Distributions in the Theory of Counters G. Sankaranarayanan 


The Transient Behavior of a Single Server Queuing Process with Recurrent nest and 
Gamma Service Time Lajos Takacs 


Efficient Estimation of a Regression Parameter for Certain Second Order Processes 
Charlotte T. Striebel 

ee CD OP. CUED 6. 6:0 5.0:5:000450450 40008004000000060n5 J. F. C. Kingman 

Queues with Batch Departures I .......--csceesceees F. G. Foster and K. M. Nyunt 


NOTES 
On the Chapman-Kolmogorov Equation Jack -Karush 
A Generalization of a Theorem of Balakrishnan N. Donald Ylvisaker 
The Opinion Pool 

Correction Notes 

Abstracts of Papers 

News and Notices 

Report of the President for 1961 

Report of the Editor for 1961 

Publications Received 











Please mention the Journal of the American Statistica, Association in writing advertisers 














BIOMETRICS 


Journal of the Biometric Society 


Vol. 17, No. 4 CONTENTS December 1961 
The Poisson Pascal Distribution S. K. Katti and John Gurland 
Some Rank Sum Multiple Comparison Tests Robert G. D. Steel 


The Estimation of Repeatability and Heritability from Records Subject to Culling 
R. N. Curnow 


Three Classes of Univariate Discrete Distributions C. G. Khatri and I. R. Patel 


Further Consideration of Methodology in Studies of Pain Relief 
Paul Meier and Spencer M. Free, Jr. 


Fitting a Geometric Progression to Frequencies E. J. Williams 


Computing Procedures for Estimating Components of Variance in the Two-way 
Classification S. R. Searle and C. R. Henderson 


Optimum Sample Size in Animal Disease Control 
A. W. Nordskog, H. T. David and H. B. Eisenberg 


Numerical Aspects of the Regression of Offspring on Parent 
H. E. McKean and B. B. Bohren 


Some Hypotheses Concerning Two Phase Regression Lines P. Sprent 


Queries and Notes 


A Practical Application of a Theoretically Inefficient Design C. P. Cox 
Query: On a Graphical Sequential Test I. D. J. Bross 
A Simple Method of Fitting the Regression Curve y = a+ 8x + Box B. K. Shah 
Query: On a Follow-up Study of the Growth of Children M. R. Sampford 
Book Review 

H. De Jonge: Quantitative Methods in Pharmacology D. J. Finney 


Biometrics is published quarterly. Its objects are to describe and exemplify the use of mathematical 
and statistical methods in biological and related sciences, in a form assimilable by experimenters. 
The annual non-member subscription rate is $7. Inquiries, orders for back issues and non-member 
subscriptions should be addressed to: BIOMETRICS, Department of Statistics, The Florida State 
University, Tallahassee, Florida. 





Please mention the Journal of the Amenican Statistica, Associarion in writing advertisers 











JOURNAL OF BUSINESS 


Graduate School of Business, University of Chicago, Chicago 37, Illinois 





Volume XXXIV OCTOBER, 1961 





Index to Volume XXXIV 


Dividend Policy, Growth, and the Valuation of Shares 
Merton H. Miller and Franco Modigliani 


Yale Brozen 


Capital Budgeting and the “Best” Tax Depreciation Method 
Sidney Davidson and David Drake 


Recent Labor Disputes over ‘Restrictive’ Practices and “Inflationary” Wage 
Increases Irwin L. Herrnstadt and Benson Soffer 


The Bayesian Approach to Statistical Decision: An Exposition ....Jack Hirshleifer 
Indexes of Retail Prices of New Cars—Consumer Price Index Allen F. Jung 


A Note on Provisional Estimates of the Gross National Product and Its Major 
Components Peter E. de Janosi 


Book Reviews Books Received 


Notes: University Schools of Business 





The JOURNAL OF BUSINESS is published quarterly by the University of Chicago Press. Sub- 
scriptions are $6.00 per year and should be addressed to the JOURNAL OF BUSINESS, Gradu- 
ate School of Business, University of Chicago. Manuscripts in duplicate, typed and double- 
spaced (including footnotes and quotatious), and editorial correspondence should be addressed 
to Irving Schweiger, Editor, JOURNAL OF BUSINESS, at the same address. 











ECONOMETRICA 


Vol. 29, 3 (July, 1961) 


CONTENTS 


L. R. KLEIN: A Model of Japanese Economic Growth, 1878-1937 


MICHAEL C. LOVELL: Manufacturers’ Inventories, Sales Expectations, and the 
Acceleration Principle 


JOHN F. MUTH: Rational Expectations and the Theory of Price Movements 
LESLIE KISH: Efficient Allocation of a Multi-Purpose Sample 
FREDERICK V. WAUGH: The Place of Least Squares in Econometrics 


YEHUDA GRUNFELD: The Interpretation of Cross Section Estimates in a Dynamic 
odel 


RAGNAR FRISCH: A Reconsideration of Domar’s Theory of Economic Growth 


DENIS SARGAN: The Maximum Likelihood Estimation of Economic Relationships 
with Auto-Regressive Residuals 


PETER E. ve JANOSI. The Statistical Discrepancy in the National Accounts Revisited 
E. MALINVAUD: The Estimation of Distributed Lags: A Comment 


REPORT OF THE STANFORD MEETING 
REPORT OF THE ST. LOUIS MEETING 
BOOK REVIEWS 

ANNOUNCEMENTS AND NOTES 





Please mention the Journal of the AMentcan Statistica, Association in writing advertisers 

















ESTADISTICA 


Journal of the 


Inter American Statistical Institute 


Vol. XIX, No. 70 March 1961 
CONTENTS 


Las C: aferencias Interamericanas de Estadistica .................César A. Orantes 


La Labor del IASI en el Campo de Organizacién y Administracién Estadistica . . 
Alfonso Perea Posada 


Tulo H. Montenegro 


El Uso de las Estadisticas en la Formulacién y Evaluacién de los Programas Sociales 
Octavio Cabello 


Aumento en el Alquiler de las Viviendas en Estados Unidos entre 1940 y 1950 
(traduccié6n) Margaret G. Reid 


Special Feature: Principales Programas Estadisticos Federales de los Estados Unidos. 
Lega! Provisions. Institute Affairs. Statistical News. Publications. 
Published quarterly Annual subscription price $3.00 CU.S. 


Inter American Statistical Institute 
Pan American Union 


Washington 6, D.C. 











THE JOURNAL OF FINANCE 


Published by THE AMERICAN FINANCE ASSOCIATION 





Vol. XVI December 1961 





ARTICLES 
The Postwar Rise in Velocity: A Sectoral Analysis 


The Implications of the Capifal Gains Tax for Investment Decisions ... 
Charles C. Holt and John P. Shelton 


Commodity Taxation and Equity David G. Davies 
Monetary Policy and the Forward Exchange Market ’..John H. Auten 


Membership dues, including $3.00 allocated to ey me to The Journal of Finance, are 
$5.00 annually. Student subscription is $2.00 a year. Libraries may subscribe to The Journal 
at $5.00 annually and single copies may be purchased for $1.25. Applications for membership 
in the American Finance Association and subscriptions to The Journal of Finance should be 
addressed to the Acting Secretary-Treasurer, Robert A. Kavesh, Graduate School of Business 
Administration, New York University, 100 Trinity Place, New York 6, New York. 


Communications relating to the contents of The Journal of Finance should be addressed to 
the Editor, Harold G. Fraine, Commerce Building, University of Wisconsin, Madison 6, Wis- 
consin, or to any of the Associate Editors, John G. Gurley, Department of Economics, Stanford 
University, Stanford, California, William W. Alberts, Graduate School of Business, University 
of Chicago, Chicago 37, Illinois, and Clyde W. Phelps, Department of Economies, University 
of Southern California, Los Angeles 7, California. 








Please mention the Journal of the A in writing advertisers 











BIOMETRIKA 
Volume 48, parts 3 and 4 CONTENTS December, 1961 


Memoirs: 


DARROCH, J. N. The two sample capture-recapture census when tagging and sampling are stratified 

FRASER, D. A. S. The fiducial method and invariance 

WETHERILL, G. B. Bayesian sequential analysis 

BROWN, BYRON WM. (JR.) Some properties of the Spearman estimator in bioassay 

SLATER, PATRICK. Inconsistencies in a schedule of paired comparisons 

MORGENTHALER, GEORGE W. Some circular coverage problems 

BARTHOLOMEW, D. J. Ordered tests in the analysis of variance 

YOUNG, D. H. Quota fulfilment using unrestricted random sampling 

WALKER, A. M. Large-sample estimation of parameters for moving-average models 

REIERSOL, OLAV. Linear and non-linear multiple comparisons in logit analysis 

SAW, J. G. Estimation of the normal population parameters given a type I censored sample 

QUESENBERRY, C. P. and DAVID, H. A. Some tests for outliers 

KINGMAN, J.'F. C. The ergodic behaviour of random walks 

KSHIRSAGAR, A. M. The goodness-of-fit of a single (non-isotropic) hypothetical principal component 

JOHN, S. On the evaluation of the probability integral of the multivariate t-distribution 

IMHOF, J. P. Computing the distribution of quadratic forms in normal variables 

CHANDA, ha C. Comp ive ies of methods of estimating parameters in linear autoregressive 
schem 

MOSTELLER, FREDERICK and YOUTZ, CLEO. Tables of the Freeman-Tukey transformations for the 
binomial and Poisson distributions 

Miscellanea: Contributions by J. D. BIGGERS; E. J. BURR and GWENDA CANE; C. F. CROUSE; 

P. D. FINCH; GERALD J. GLASSER and ROBERT F. WINTER; D. HOGBEN, R. S. PINKHAM and 

M. B. WILK; B. K. KALE; J. G. SAW; A. ZINGER. 

Reviews 

Other Books Received 

Corrigenda 

The subscription, payable in advance, is 54/- (Or $8.00) per volume (including postage). Cheques should 

be made peyable to Biometrika, crossed “a/c Biometrika Trust’’ and sent to The Secretary, Biometrika 

Office, University College, London, W.C. 1. All foreign cheques must be drawn on a Bank having a London 

agency. 


Issued by THE BIOMETRIKA OFFICE, University College, London. 




















INTERNATIONAL JOURNAL OF ABSTRACTS 
STATISTICAL THEORY AND METHOD 


A Journal of the International Statistical Institute 


The aim of this journal of abstracts is to give complete coverage of published papers 
in the field of statistical theory (including associated aspects of probability and other 
mathematical methods) and new published contributions to statistical method. 

All contributions in the following five journals—being wholly devoted to this field— 
are abstracted: Annals of Mathematical Statistics; Biometrika; Journal, Royal Statistical 
Society (Series B); Bulletin of Mathematical Statistics; Annals, Institute of Statistical 
Mathematics; and a further group of six journals are abstracted on a virtually complete 
basis as follows: Biometrics; Metrika; Metron; Review, International Statistical Institute; 
Technometrics; Sankhya. There are about 250 other journals partly devoted to statistical 
theory and method from which the appropriate papers are abstracted. 

The abstracts are about 400 words long—the recommendation of UNESCO for the “long” 
abstract service: they are in the English language although the original language of the 
paper is noted on the abstract together with the name of the abstractor. In addition the 
address of the author(s) are given in detail to facilitate contact in order to obtain further 
detail or request an off-print. The journal is published quarterly and contains ap- 
proximately 1,000 abstracts per year. 

A scheme of classification has been developed for the abstracts that is flexible and facili- 
tates the transfer of code numbers to punched-cards. A unique aspect of this journal is 
that the pages are colour-tinted according to the main sections of classification. This method 
of cclour-coding the pages provides a distinctive and powerful visual aid in the identifica- 
tion of abstracts in whatever manner the journal is filed for reference. 

Annual Subscription £5 (U.S.A. and Canada $16.00) 
Single Number 30s. (U.S.A. and Canada $4.50) 


OLIVER AND BOYD LTD. 
Tweeddale Court, 14 High Street, Edinburgh, 1. 








Please mention the Journal of the Amenican Statistica, Association in writing advertisers 








ALBANY 

ARIZONA 

ATLANTA 

AUSTIN 

Boston 
BuFrraLo-NIAGARA 
CENTRAL INDIANA 


CENTRAL Iowa 


CENTRAL NEw JERSEY 


CuHIcaGo 
CINCINNATI 


CLEVELAND 


CoLorapo-W YOMING 


CoLuMBUS 
CONNECTICUT 


DAYTON 
DETROIT 


HARRISBURG, Pa. 


Hawall 
ILLINOIS 


ITHACA 
MADISON 
MILWAUKEE 
MONTREAL 
NEBRASKA 


New ORLEANS 
New YorKE 


Norta CAROLINA 


Norts Texas 
PHILADELPHIA 


PITTSBURGH 


Pogsrto Rico 
ROCHESTER 


SACRAMENTO 


San FRANCISCO 


SouTHERN CALIFORNIA 


Strate Couiueag, Pa. 


Sr. Louis 


TULSA 


Twin Cities (MINN.) 


VIRGINIA 


Wasarneton, D. C. 


CHAPTER PRESIDENTS 


Helen C. Chase, Principal Biostatistician, New York State Dept. of 
Health, 84 Holland Ave., Albany 8, New York 

Ned Serrio, % IBM, 3424 North entral Avenue, Phoeniz, Arizona 

John L. Fulmer, Georgia Institute of Technology, Atlanta 13, Georgia 

John A. Sutton, pee. Traffic Analysis, Texas Department 
ges Public Safety, P. O. Box 4087, North Austin Stat., Austin 61, 


Gottfried E. Noether, Mathematics Dept., 
Commonwealth Ave., Boston 15, Mass. 
Edward H. Dowd, Roswell Park’ Memorial Inst., 666 Elm Street, 
Buffalo 3, New York 
Robert A. Calhoun, Indiana State Board of Health, 1380 W. Michi- 
gan, Indianapolis, Indiana 
Foster B. Cady, Jr., Statistical Laboratory, Iewa State University, 
Ames, Iowa 
William R. Allen, P.O. Boz 574, Princeton, New Jersey 
Wesley D. Mitchell, The Peoples Gas Light & Coke Co., 122 South 
Michigan Avenue, Chicago 3, Illinois 
James J. ‘Tumbusch, Procter & Gamble Co., Ivorydale Technical 
Center, Cincinnati 17, Ohio 
Julio N. Berrettoni, Statistics Department, Western Reserve Uni- 
versity, Cleveland 6, Ohio 
James A. Niederjohn, Jdeal Cemend Company, 821 17th Street, 
Denver 2, Colorado 
Merton D. Oyler, Dept. 
Columbus 10, Ohio 
Alfred G. Whitney, Life Insurance Management Association, 170 
Sigourney Street, Hartford, Connecticut 
Miss Theresa Fricke, 1104 Creighton Avenue, Dayton 20, Ohio 
John M. Mattila, Dept. of Economics, Wayne State University, 
Detroii, Michigan 
Irvin F. O. Wingeard, Bureau of Employment Security, Depart- 
ment of Labor and Industry, Harrisburg, Pennsylvania 
Keith K. Wallace, 4094 Round Top Drive, Honolulu 14, Hawaii 
Thomas A. Yancey, Dept. of Economics, University of Illinois, 
1007 8. Wright Street, Urbana, Illinois 
C. R. Henderson, Department of Husbandry, Cornell University, 
Ithaca, New York 
Chester W. Harris, School of Education, University of Wisconsin, 
Madison 6, Wisconsin 
Norman J. Kaye, College of Business Administration, Marquette 
University, Milwaukee 3, Wisconsin 
Jacques St. Pierre, Centre of Statistics, Dept. of Math., University 
of Montreal, Montreal, Que., Canada 
John Birch, 213 Burnette Hall, University of Nebraska, Lincoln, 
Nebraska 
Roland Pertuit, 4871 Metropolitan Drive, New Orleans, Louisiana 
Robert E. Lewis, Economics Department, First National City 
Bank of N. Y., New York 15, N. Y. 
Hale C. Sweeny, Research Triangle Institute, P. O. Boz 490, 
Durham, North Carolina 
Walter W. Hoy, 10110 Shady Oak Lane, Dallas 29, Texas 
Stanley Schor, Statistics Dept., University of Pennsylvania, 
rig gg | 4, Pennsylvania 
John W. Wilkinson, Westinghouse Research Labs., Pittsburgh 35, 
Pennsylvania 
Alvin Mayne, 2169 Calle Gen Del Vaile, Santurce, Puerto Rico 
A. Lester Lustik, Rochester Gas and Electric Corp., 89 East 
Avenue, Rochester 18, N. Y. 
Edmond A. Radsliff, Senior Statistician, State of Calif., Division 
of Highways, Sacramento, California 
Ernest C. Olson, Economics Dept., Bank of America N.T. & S.A., 
300 Mont omery Street, San Francisco, California 
William V. Henderson, The Pacific T Caliyoreé and Telegraph Co., 
740 8. Olive Street, Los Angeles 55, California 
Robert W. Kautz, School of Business, Pennsylvania State Univer- 
sity, University ‘Park, Pennsylvania 
George _" Little, Southwestern Bell Telephone Co., 1010 Pine 
Street, St. Louis 1, Missouri 
Lyndral E. Marcum, Blue Crose Shield Hospital & Physician 
Service, P.O. Boz 1738, Tulsa, Oklahoma 
Eugene A. Johnson, Dept. of Mechanical Engineering, University of 
Minnesota, M inneapolis 14, Minnesota 
James Armstrong, Jr., E. 1. du Pont de Nemours & Co. Inc., P.O. 
Boz 1477, Richmond 12, Virginia 
Edwin D. Goldfield, Chief, Statistical Reports Division, Bureau of 
the Census, W ashington 25, D.C. 


Boston University, 725 


of Sociology, Ohio State University, 





and Mikts., State 
Jack Kekar, 1811 B.. 
Miss Emily L. 
Jone i. gee Ss 
John E. r: Office 
BurraLo-N1aGaRa 
CrnraL INDIANA 
CuntnaL Iowa 

Ames, Towa 

Cunrnat New Junszy Ousaler: 5, Pokempner, ‘Mathematica, 9% A Nassau, Princeton 1, 
Curcaco Danial I. Seldon, Sui and Company, 4116 8. Packer: Ave 


CINCINNATI wc hcage fs Hat my Robert A. Taft me ne Center, 


Marie Darnovely, rferket Hewsorch Services, 666 Bip yodrome: 


: ih Gee, 
rag Na ig 188 148 Want og arg Sine sag 
Mis. Freda, Witherow 1 Kurte, 9064 North Gettysburg Avenusy 
a ore Assistont General Counsel, Feders! Reserve 
Hanriszvne, Pa. Joint Government Commission, Ero 460 a 
Hawau wt Aitond: Pal Joke Child & Coy lat alaba Street, Hon-telu 13, 6 : 


Iuuixow Heuy F. . Kaieoe Bomarg Fog: ee Seve ay 


Irmaca i 2a: Hen Yor York eae yg “= 
Wiasler Pog ‘é ‘adison an 

Mn waveen “rie hate Ce ee sa Bociélogy, Marguaue U risertity, — 

t. Ni Be 
Mow mma. PPE pase a Preise Offcor, Conan ational Rye, 
Nepxasxa Charlee Mees cae accom a : 
N Elsie M. Watters, ) wlan 

mw ORLeaNns one School of Business ————," ae ; 

Naw Yous 
Norrn Cakoina 


Norra Tras 
PHILADELPara 
Pirresuras 
Porrto Rico 


SAcRAMENTO 


San FrRaNcisco 
Sournzrn Casivornia 


Srats Coutzes, Pa, oc rs ris 
Sz. Lovis A. Sh 






















STATISTICAL DECISION THEORY 


By LIONELL WEISS, Cornell University. The Meret Sree Prob 
ability and Statistics. 195 pages, $7.50 
This text describes and dey lege: modein-teaticioe! detdslon thes) eh’ ionipeiiicl 
matheraaicl level. Ths Sui: See Sorter ee) ee ae 
the next four cover statis’‘cal decision theory, including linear 
computational tool and problems involving making a sequence of d patna: wg 
oony 





The final chapter develops standard techniques of conventional Sak 
special cases of statistical decision theory. Ronin domey een 


AN INTRODUCTION TO LINEAR STATISTICAL MODELS, Volume I 
By FRANKLIN A. GRAYBILL, Colorado State University. The 
Series in Probability and Statistics. 463 pages, $12.50 

This textbook has been written to fulfill two needs: for a theory textbook for seniors 

and first year graduate students in statistics and for a reference book in the arez of — 


regression, correlation, least squares, experimental design, etc., for consulting sta- 
tisticians with limited mathematical training. 


STATISTICAL ANALYSIS 


By EDWARD C. sail; Unnyeoumy of Wreciie heb pean, SO 
This book presents those principles of statistics which are most needed as a batk-_ 
ground mood tia act's elt meat tile pal 
aviliqation to mannose, iplsnan, tether igs Se Sapo Sonne i oe 


INTRODUCTION TO STATISTICS FOR BUSINESS DECISIONS 
By ROBERT SCHLAIFER, Harvord University. 982 pages, $7.78 
ABILITY AND STATISTICS FOR RUSINESS, The a1 primey. ob ect is 
NN Ne ie pan Sokied peal tea ee heory 
Neyman and Pearson and to show how this theory. 1. .complsted tathei han ec 
ere lade Bh eedem osetia oe 308 
shcGraw-Hill Book r mpany, I 
330 West 42nd Street ‘a ee 
Pa ne Ja he ss Se, : 








“ wegen’ 





